MetaPaperLens needs your prompt to instruct the AI model to emit two
specific JSON structures: an evidence array and an
extraction_confidence object. The viewer reads these to
render PDF highlights and per-block confidence badges. Without them
the extraction will still run, but those panels stay empty.
If your prompt doesn't yet ask for these structures, copy the
examples below into your prompt body. Substitute the placeholder
field names (<your_field>, <data_block_1>)
with the actual top-level keys your prompt asks the model to produce.
evidence array
Every sample / result object must carry an evidence array.
Each entry has exactly four keys: snippet,
page, source, and field. The
field value is a JSON path identifying which extracted
value the evidence supports — formatted to mirror the JSON shape your
prompt asks for.
"evidence": [
{
"snippet": "TABLE 1. Descriptive statistics by group...",
"page": 3,
"source": "Table 1",
"field": "samples[0].<your_field>"
},
{
"snippet": "the sample comprised 147 non-clinical adolescents",
"page": 2,
"source": null,
"field": "samples[0].sample_id"
}
]
Rules:
snippet must be verbatim from the paper — character-for-character, no paraphrase, no ellipses.page is the sequential PDF page number (1 = first page), as an integer. Never omit.source is the table / figure identifier (e.g. "Table 2") or null.field is the JSON path into your output, e.g. "samples[0].demographics._table[0]".extraction_confidence object
A top-level extraction_confidence object (sibling of
samples, summaries, etc.) with one entry per
major extracted-data block in your output. Each entry has a
level rated high / medium /
low, plus a notes string when not high.
"extraction_confidence": {
"<data_block_1>": { "level": "high" },
"<data_block_2>": {
"level": "medium",
"notes": "country inferred from author affiliations; not stated explicitly"
},
"<data_block_3>": {
"level": "low",
"notes": "values reported only for the overall scale; subgroup entries estimated"
}
}
Rules:
level ∈ {"high", "medium", "low"}.notes (≤ 200 chars) is required on medium and low; optional on high.samples, not inside evidence, not per-record.evidence itself or for extraction_confidence itself.evidence array alongside your main data field, with the four-key spec above.extraction_confidence object at the top level of the output, with one entry per major data block.
Why this matters: the renderer matches each evidence entry's
field path back to the JSON you produced, then highlights
the corresponding page region. The confidence object drives the
per-block badges on the results panel. Both depend on the structures
being present and JSON-shaped — keyword mentions alone aren't enough.