LOOK

This will look for new pathways amongst the old in the literature. untested. v0.1

Mar 24, 2026

Feed REFERENCES.TXT and LOOK into your AI and see if it does anything. Could be crap.

>>>>>>>>>>>>>> write Part 0 substack article, how graphs are used to surface new pathways in this progam

Part 0 Graphs

This preface explains, in plain terms, what the Graph stage does and why it is the essential first step in this workflow. The Graph stage turns a bibliography into a structured map of what the literature says and what it implies, surfacing three classes of pathways that spark experiments and new papers: OLD, OLD → EXTENDED, and NEW.

Why Graphs Matter

Reveal hidden connections — Many discoveries live between papers. Graphs make those cross‑paper links visible.
Prioritize what to read and test — Instead of scanning hundreds of PDFs, you get ranked leads with sentence‑level evidence.
Make stitching reproducible — The pipeline records the exact choices and fallbacks so every run can be audited and repeated.
Scale across domains — The same graph logic works for chemistry, biology, materials, social science, or any corpus of references.

What the Graph stage actually does

Ingests a canonical bibliography and optional local files.
Extracts text at sentence or paragraph level from abstracts or PDFs.
Normalizes terminology so synonyms and variants map to canonical tokens.
Builds a multi‑layer network with nodes for papers, concepts, reactions or events, and authors.
Creates multiple edge types: citations, co‑occurrence, semantic similarity, and reported links backed by evidence snippets.
Computes graph analytics such as centralities, PageRank, communities, and shortest paths.
Searches for chains that connect concept A to concept B and scores them by support, novelty, and plausibility.
Classifies surfaced chains into OLD, OLD → EXTENDED, or NEW and returns human‑readable previews plus full JSON arrays.

How pathways are identified and classified

OLD
- Definition: a pathway where at least one paper in the corpus documents the full chain end‑to‑end.
- Why it matters: these are literature anchors and good control conditions.
OLD → EXTENDED
- Definition: a canonical paper documents the core chain; other independent papers add new links, modifiers, or conditional steps.
- Why it matters: high value, often publishable quickly because the canonical chain already exists.
NEW
- Definition: a stitched pathway assembled across two or more independent papers where no single paper asserts the full chain.
- Why it matters: hypothesis generators with high novelty but higher validation needs.

Key detection rules

Require at least a configurable minimum of distinct supporting edges per stitched link.
Require at least one strong semantic/evidence signal or two moderate signals per link.
Flag low‑diversity support, contradictions, or single‑lab dominance rather than passing them.

What you must provide to run the Graph stage

Minimum required

REFERENCES.txt — the canonical bibliography file. Each line should include a numeric id and at least a title or DOI.

Strongly recommended

abstracts.csv with columns ref_id,title,abstract to improve evidence extraction.
A pdfs/ folder with local PDFs for sentence‑level snippets.
precomputed_embeddings/ if you want to avoid recomputing embeddings.

Instruction block to include

Paste the LOOK logic block immediately before REFERENCES.txt. The LOOK block tells the LLM to run the Graph pipeline internally and return the pathway lists in chat.

Runtime questions the LLM will ask

How many OLD to surface (default 50)
How many OLD → EXTENDED to surface (default 20)
How many NEW to surface (default 100)
How many top previews to show for each class (defaults provided)
Semantic similarity threshold and minimum supporting edges

All answers are recorded in config_used so the run is reproducible.

What you will receive and how to use it

Immediate chat outputs

2–4 line human summary highlighting the most interesting findings.
Preview lists for OLD, OLD → EXTENDED, and NEW sized per your choices. Each preview item includes id, short title, 1–2 sentence summary, endpoints, ordered path nodes, supporting paper ids, one evidence snippet per link with similarity score, composite score, and a validation flag.
Full JSON arrays named old_pathways_full, old_extended_pathways_full, and new_pathways_full.
config_used object listing runtime answers and any fallbacks.

How to act on the outputs

Scan previews to pick 3–5 candidates for deeper reading.
Run VALIDATOR next to check schema, evidence presence, independence, contradictions, and stability.
Draft a one‑page research brief for top candidates and design minimal decisive experiments.
Record and share the LLM prompts and config_used when collaborating or publishing.

Quick checklist before you upload

Create REFERENCES.txt with numbered entries and DOIs where possible.
Paste the LOOK block immediately before REFERENCES.txt.
Optionally include abstracts.csv and a pdfs/ folder for richer evidence.
Decide whether to accept defaults or answer the runtime prompts when asked.
Expect the LLM to do the work internally and return the pathway lists in the chat.

Final note

The Graph stage is the creative engine of this program: it converts a bibliography into a navigable map of what the literature already says and what it implies but has not yet written. Use it to generate hypotheses, prioritize reading, and design small, decisive tests. Parts 1 through 3 then show how to interpret, validate, and convert those leads into briefs, experiments, and funded projects.

—

yup - please give me a code block to post right before REFERENCES.TXT - we will call it LOOK

Copilot said

json

{
  "id": "LOOK",
  "version": "1.0",
  "purpose": "Run the GRAPH pipeline internally on the provided REFERENCES.txt and return ranked OLD, OLD-EXTENDED, and NEW pathway lists (preview + full JSON arrays) in this chat.",
  "inputs_expected": {
    "references_file": "REFERENCES.txt (placed immediately after this prompt in the same upload)",
    "optional": ["abstracts.csv", "pdfs/ (local PDFs)", "precomputed_embeddings/"]
  },
  "runtime_prompts": [
    {"name":"num_old_pathways","prompt":"How many OLD pathways to surface?","default":50,"type":"integer"},
    {"name":"num_old_extended","prompt":"How many OLD → EXTENDED pathways to surface?","default":20,"type":"integer"},
    {"name":"num_new_pathways","prompt":"How many NEW (stitched) pathways to surface?","default":100,"type":"integer"},
    {"name":"top_preview_old","prompt":"How many top OLD to preview in the immediate reply?","default":10,"type":"integer"},
    {"name":"top_preview_old_extended","prompt":"How many top OLD → EXTENDED to preview?","default":5,"type":"integer"},
    {"name":"top_preview_new","prompt":"How many top NEW to preview in the immediate reply?","default":10,"type":"integer"},
    {"name":"similarity_threshold","prompt":"Semantic similarity threshold for 'moderate' edges (0-1)","default":0.65,"type":"float"},
    {"name":"min_supporting_edges","prompt":"Minimum distinct supporting edges required per stitched link","default":2,"type":"integer"}
  ],
  "execution_instructions": [
    "1) Prompt the user for the runtime variables above and record their answers.",
    "2) Parse the uploaded REFERENCES.txt into canonical ref_ids (use local abstracts/PDFs if available).",
    "3) Build a multi-layer literature graph (paper, concept, reaction/event, author) with edges: citation, cooccurrence, semantic_similarity, reported_link. Use sentence/paragraph embeddings (lightweight model preferred) or TF-IDF fallback.",
    "4) Compute centralities (degree, betweenness, closeness, PageRank), communities (Louvain), and shortest paths.",
    "5) Identify and rank pathways:",
    "   - OLD: pathway fully documented end-to-end in at least one paper in the corpus. Rank by reliability.",
    "   - OLD → EXTENDED: canonical paper documents the core chain; other independent paper(s) add new link(s) or modifiers. Rank by extension value + reliability.",
    "   - NEW: stitched pathway assembled across two or more independent papers; no single paper asserts the full chain. Require at least min_supporting_edges per stitched link and at least one strong (>=0.8) or two moderate (>= similarity_threshold) signals per link.",
    "6) For each surfaced pathway compute components: reliability, novelty (or novelty_of_extension), plausibility/extension_value, composite_score (default composite = 0.5*reliability + 0.3*novelty + 0.2*plausibility; for OLD→EXTENDED use 0.5*reliability + 0.3*extension_value + 0.2*novelty_of_extension).",
    "7) Conservative policy: flag ambiguous, low-diversity, or contradictory candidates rather than passing them.",
    "8) De-duplicate equivalent pathways and keep the highest-scoring variant.",
    "9) Record resolved runtime configuration (all prompt answers, embedding model used, thresholds, fallbacks) and include it in the reply."
  ],
  "reply_format_requirements": {
    "lead": "2–4 line human summary highlighting the most interesting OLD, OLD-EXTENDED, and NEW findings.",
    "sections": [
      "OLD PATHWAYS (preview size = top_preview_old): list items with fields id,type,title,summary,endpoints,path_nodes,supporting_papers,evidence_snippets(one per link with paper id and similarity),composite_score,validation_flag",
      "OLD-EXTENDED PATHWAYS (preview size = top_preview_old_extended): same fields plus canonical_paper and extension_papers",
      "NEW PATHWAYS (preview size = top_preview_new): same fields as OLD"
    ],
    "full_lists": "Include compact JSON arrays named old_pathways_full, old_extended_pathways_full, new_pathways_full sized to the user's runtime choices (or defaults). Each element must include the fields above plus a short notes string explaining classification.",
    "final_lines": [
      "Config used: list of runtime variable values and any fallbacks (embedding model, TF-IDF fallback).",
      "Next action suggestion: one-line recommendation to run VALIDATOR next."
    ]
  },
  "deliverables_in_chat_only": [
    "Human summary (2–4 lines)",
    "Preview lists for OLD, OLD-EXTENDED, NEW (as specified)",
    "Full JSON arrays old_pathways_full, old_extended_pathways_full, new_pathways_full",
    "config_used object"
  ],
  "constraints": [
    "Do all processing internally; do not produce or require downloadable files for this step.",
    "Operate subject-agnostically; do not apply domain-specific heuristics unless domain hints appear in the references.",
    "Be conservative: prefer flagging ambiguous items.",
    "Preserve exact field names and include similarity scores and evidence snippets for each link."
  ],
  "one_line_start_prompt": "Run GRAPH on the uploaded REFERENCES.txt, ask me the runtime variables above (use defaults if I don't answer), then return a 2–4 line summary and preview + full JSON arrays for OLD, OLD-EXTENDED, and NEW pathways; include config_used."
}

You said

give me a substack article (part 1) on what the script is for and what the user needs to do to use it

Copilot said

Overview

This is Part 1 of a short Substack guide explaining what the literature‑graph workflow does and what you must supply to run it. The goal is simple: hand an LLM a single canonical references file and a compact instruction block, and get back two creative, ranked lists that spark new research directions — OLD pathways that already exist in the literature, OLD → EXTENDED pathways where canonical chains are extended by other papers, and NEW stitched pathways that no single paper states end‑to‑end.

What the script is for

Surface collective insights that live between papers rather than inside them.
Reveal stitched mechanisms by chaining evidence across independent studies.
Highlight extensions where canonical pathways are modified or expanded.
Produce actionable leads for synthesis papers, experiments, or grant ideas.
Do this fast and subject‑agnostically so you can iterate on any corpus of references.

This is not a replacement for domain expertise. It is a discovery engine that turns a bibliography into a map of plausible, evidence‑backed connections worth human follow‑up.

What you need to provide

Minimum required

REFERENCES.txt placed in the working upload. Each line should be a numbered reference with at least a title or DOI.
Example lines
text

1. Smith J, 2020, Title of paper, DOI:10.1000/xyz123
2. Lee A, 2019, Another title, DOI:10.1000/abc456

Highly recommended but optional

abstracts.csv with columns ref_id,title,abstract to improve semantic extraction.
Local PDFs for sentence‑level evidence extraction placed in a pdfs folder.
Precomputed embeddings if you want to avoid recomputation.
Terminology mapping CSV to canonicalize synonyms in your domain.

Instruction blocks to upload

LOOK logic block placed immediately before REFERENCES.txt. This single block tells the LLM to run the GRAPH pipeline internally and return OLD, OLD → EXTENDED, and NEW lists in the chat.

How the LLM run works at a glance

Prompt for runtime variables such as how many OLD and NEW pathways to surface. Defaults are provided.
Parse references into canonical IDs and fetch abstracts or use local PDFs if available.
Build a multi‑layer graph with nodes for papers, concepts, reactions, and authors and edges for citations, co‑occurrence, semantic similarity, and reported links.
Compute graph analytics including centralities, communities, and shortest paths.
Detect pathways and classify them as OLD, OLD → EXTENDED, or NEW using conservative rules that require multi‑signal support.
Rank and return preview lists and full JSON arrays for programmatic follow‑up.
Record config_used so every run is reproducible.

What you will receive in the chat

2–4 line human summary that highlights the most interesting findings.
Preview lists for OLD, OLD → EXTENDED, and NEW pathways sized by your runtime choices. Each preview item includes id, title, 1–2 sentence summary, endpoints, path nodes, supporting papers, one evidence snippet per link with similarity score, composite score, and a validation flag.
Full JSON arrays named old_pathways_full, old_extended_pathways_full, and new_pathways_full sized to your requested totals.
Config used object listing runtime answers and any fallbacks used by the LLM.
A single next action suggestion recommending running the validator to check evidence strength and diversity.

Quick user checklist before you upload

Create REFERENCES.txt with numbered entries and DOIs where possible.
Decide preview sizes or accept defaults: OLD 50, OLD → EXTENDED 20, NEW 100, preview top 10 each.
Optional: place abstracts.csv and a pdfs/ folder in the same upload if you want richer evidence extraction.
Paste the LOOK block immediately before uploading REFERENCES.txt. The LOOK block will prompt you for runtime variables when the LLM starts.
Expect the LLM to do the work internally and return the lists in the chat. Save the JSON arrays for downstream validation and local analysis.

What happens next

After you receive OLD, OLD → EXTENDED, and NEW lists from the LLM, run the VALIDATOR logic block to check schema integrity, evidence presence, independence of support, contradiction detection, and stability. The validator produces a filtered, annotated candidate list you can trust for deeper reading or experimental planning.

Closing note

This workflow is designed to be fast, conservative, and reproducible. It turns a bibliography into a creative map of what the literature already says and what it implies but has not yet written. Part 2 will walk through interpreting the top candidates and turning them into a short research brief you can hand to collaborators or reviewers.

You said

Part 2 substack please

Copilot said

Part 2 — From surfaced pathways to a short research brief

This installment shows how to read the LLM’s OLD, OLD → EXTENDED, and NEW pathway outputs, decide which leads to pursue, and convert a top candidate into a concise research brief you can hand to collaborators or use to plan experiments. The goal is to move quickly from inspiration to action while keeping evidence, uncertainty, and reproducibility explicit.

Quick orientation on pathway classes

OLD — a single paper in your corpus documents the full chain end‑to‑end. Treat these as confirmed literature syntheses; they are useful baselines and citation anchors.
OLD → EXTENDED — a canonical paper already states the chain; other papers add new links, conditions, or modifiers. Treat these as high‑value extensions: lower novelty than NEW but often higher feasibility.
NEW — the chain is assembled across two or more independent papers; no single paper asserts the full pathway. Treat these as hypothesis generators that require careful validation.

How to read a pathway block efficiently

Header line — id, type, composite score. Use the score to triage.
One‑line summary — captures the claim in plain language; read this first.
Endpoints and path_nodes — verify the start and end concepts and the ordered chain of papers/concepts.
Evidence snippets — read the snippet, note the paper id and section, and check the similarity score. Prioritize snippets from results or methods over introductions.
Validation flag — treat none as ready for deeper reading; treat needs_rescore, low_diversity, or contradiction_found as requiring extra scrutiny.
Notes — short rationale for classification; often contains the most useful cue for next steps.

Prioritization rubric for choosing leads

Use this lightweight rubric to pick 3–5 candidates to investigate further.

Reliability first (weight 0.5)
- Prefer candidates with multiple independent supporting papers and evidence from results/methods sections.
Novelty second (weight 0.3)
- Prefer NEW or OLD → EXTENDED items that add a nontrivial link or condition not widely reported.
Plausibility third (weight 0.2)
- Prefer chains that do not require extreme assumptions; domain heuristics or simple mechanistic plausibility increase rank.
Quick filters
- Exclude candidates with contradiction_found unless the contradiction itself is scientifically interesting.
- Flag low‑diversity candidates for targeted literature search before committing resources.

Turning a pathway into a one‑page research brief

Below is a compact template you can fill in for each top candidate. Keep it to one page: clear claim, evidence, gaps, proposed next steps, and immediate asks.

Research Brief Template

Title
Short descriptive title (one line).

Claim
One sentence: what the pathway asserts (A → … → Z).

Why it matters
Two sentences: potential impact, novelty, and who cares.

Evidence summary

Canonical support: list paper ids and one‑line evidence from each that covers parts of the chain.
Stitched support: list paper ids that provide the connecting links and the exact snippet or method detail that supports each link.

Confidence and risks

Confidence score (0–1) with brief justification.
Main risks: contradictory evidence, single‑lab bias, weak semantic matches.

Key unknowns and experiments

Bullet list of 2–4 concrete experiments or analyses that would validate the chain (e.g., replicate condition X, measure intermediate B, run a control).
For computational leads: specify datasets, models, or simulations needed.

Immediate next actions (week 0–4)

1–2 literature checks (which PDFs to read in full).
1 small experiment or in‑silico test to raise confidence.
1 person to contact or one collaboration ask.

Attachments

Evidence snippets (from the LLM output) and config_used excerpt.

Example brief (compact)

Title
A converts to C via B under low pH conditions

Claim
Paper12 shows A→B at pH 5.5; Paper37 shows B→C oxidation at pH 6.0; no paper reports A→C end‑to‑end.

Why it matters
If true, this pathway suggests a simple pH‑tunable route to C that could simplify synthesis and reduce steps.

Evidence summary

Canonical support: Paper12 (results): “A is converted to B when enzyme E is upregulated” (similarity 0.82).
Stitched support: Paper37 (methods): “B oxidizes to C at pH 6.0” (similarity 0.78).

Confidence and risks

Confidence 0.62 — two independent papers but different labs and slightly different pH conditions.
Risk: reaction conditions may not overlap; intermediate B may be unstable.

Key unknowns and experiments

Test A→B at pH 6.0 in the same buffer used by Paper37.
Measure B lifetime under those conditions.
Run a one‑pot A→C trial with time‑course sampling.

Immediate next actions

Read full PDFs of Paper12 and Paper37.
Run a 2‑day bench test for A→B at pH 6.0.
Draft a short email to potential collaborator with expertise in enzyme E.

Practical checklist before you act

Download the full JSON for the candidate and the evidence snippets.
Open the cited papers’ full texts (methods and results).
Recompute local embeddings or run a quick TF‑IDF check if the candidate was flagged needs_rescore.
Check diversity: are supporting papers from different groups? If not, deprioritize or seek independent replication.
If contradictions exist, extract the conflicting snippets and add them to the brief under “Main risks.”

Next steps after the brief

Run the VALIDATOR block to get a formal validation report and rescored candidates.
For top 1–2 briefs, prepare a short experimental plan or computational protocol and a one‑page budget/time estimate.
Use the brief to solicit quick feedback from a domain colleague before committing lab time.

Closing note

The LLM‑driven graph surfaces creative, evidence‑backed leads quickly. Your job is to treat those leads like hypotheses: verify the evidence, quantify uncertainty, and design small, decisive tests. Use the one‑page brief to move from inspiration to action without getting lost in the literature.

what is the format for abstract file?

Abstract file format

Filename: abstracts.csv

Format: UTF‑8 encoded CSV, comma separated, double‑quoted fields when needed, LF newlines. Each row is one abstract record; the ref_id must match an entry in your REFERENCES.txt.

Required columns

ColumnDescriptionref_idCanonical reference identifier that exactly matches an id in REFERENCES.txttitlePaper title (short)abstractFull abstract text (one paragraph)

Recommended optional columns

ColumnDescriptionyearPublication yeardoiDOI stringauthorsShort author list (e.g., “Smith J; Lee A”)languageISO language code (e.g., en, fr)source_fileLocal PDF filename if available (e.g., pdfs/Paper12.pdf)

Validation rules and tips

ref_id matching: ref_id must match exactly the identifier used in REFERENCES.txt. Use the same punctuation and casing.
No duplicates: one row per ref_id. If multiple abstracts exist for a ref, merge or keep the most complete one.
Field quoting: wrap fields in double quotes if they contain commas, newlines, or quotes; escape internal quotes as "".
Length: abstracts may be long; keep the full abstract but avoid embedding entire full texts in this file.
Missing abstracts: omit the row or include the ref_id with an empty abstract field; the Graph pipeline will fall back to title/metadata.
Language: include language if corpus is multilingual to enable appropriate embedding/model choices.
File size: split very large files (>100k rows) into chunks named abstracts_part1.csv, etc., for reliability.

Example header and sample row

csv

ref_id,title,abstract,year,doi,authors,language,source_file

Paper12,”A novel conversion of A to B”,”Paper12 reports that A is converted to B under enzyme E upregulation; experiments performed at pH 5.5 showed consistent conversion.”,2020,10.1000/xyz123,”Smith J; Lee A”,en,pdfs/Paper12.pdf

Quick checklist before upload

Ensure every ref_id in abstracts.csv appears in REFERENCES.txt.
Save as UTF‑8 CSV and verify no stray control characters.
If you have PDFs, include source_file paths that match your pdfs/ folder.
If unsure, provide a small sample (5–10 rows) first to confirm parsing.

If you want, I can produce a one‑line validator script (CSV header checks only) you can run locally to confirm the file matches expectations.

You said

show me VALIDATOR please

Copilot said

VALIDATOR

Below is a complete, copy‑pasteable VALIDATOR logic block and an operational guide you can upload to an executor LLM or keep as the canonical spec for a local validator. Use this to verify GRAPH outputs for schema, evidence presence, independence of support, contradiction detection, and stability. The block includes required inputs, runtime prompts, validation rules, expected outputs, and recommended conservative settings.

json

{
  "name": "VALIDATOR",
  "version": "1.0",
  "purpose": "Independently verify GRAPH outputs for schema correctness, evidence presence, independence/diversity of support, contradiction detection, and stability; produce validated candidate lists and a validation report.",
  "inputs_expected": {
    "graph_outputs_dir": "working_directory containing GRAPH outputs",
    "required_files": [
      "graph.gexf",
      "metrics_summary.csv",
      "stitched_candidates.csv",
      "evidence_maps.json",
      "config_used.json",
      "REFERENCES.txt"
    ],
    "optional_files": [
      "abstracts.csv",
      "pdfs/",
      "precomputed_embeddings/"
    ]
  },
  "runtime_prompts": [
    {"name":"validation_mode","prompt":"Validation mode: quick or thorough?","default":"thorough","type":"string"},
    {"name":"recompute_embeddings","prompt":"Recompute embeddings for evidence snippets? (true/false)","default":true,"type":"boolean"},
    {"name":"embedding_model_override","prompt":"If recomputing, which embedding model to use? (leave blank to use default)","default":"all-MiniLM-L6-v2","type":"string"},
    {"name":"similarity_threshold_pass","prompt":"Similarity threshold to consider a link supported (0-1)","default":0.8,"type":"float"},
    {"name":"min_supporting_edges","prompt":"Minimum distinct supporting edges required per stitched link","default":2,"type":"integer"},
    {"name":"require_independence_groups","prompt":"Require supporting papers from at least N independent groups (0 to disable)","default":2,"type":"integer"},
    {"name":"bootstrap_resamples","prompt":"Number of bootstrap resamples for stability testing","default":100,"type":"integer"},
    {"name":"strict_mode","prompt":"If true, raise flagged items to fail unless human overrides","default":false,"type":"boolean"}
  ],
  "validation_steps": [
    "1. Verify presence and schema of required files; fail early with clear error messages if missing or malformed.",
    "2. Parse stitched_candidates.csv and evidence_maps.json; ensure every candidate references existing paper ids in REFERENCES.txt and graph.gexf.",
    "3. For each candidate, re-extract evidence snippets from abstracts.csv or local PDFs if recompute_embeddings=true; otherwise validate that evidence_snippets in evidence_maps.json point to valid source locations.",
    "4. If recompute_embeddings=true, compute embeddings for each evidence snippet using embedding_model_override; otherwise compute TF-IDF vectors as fallback.",
    "5. Recompute similarity scores for each link and compare to similarity_threshold_pass; mark link as supported, weak, or unsupported.",
    "6. Check independence: group supporting papers by author groups or affiliations; compute number of independent groups and flag low-diversity support if below require_independence_groups.",
    "7. Detect contradictions: search evidence snippets for negations, explicit refutations, or mutually exclusive claims; annotate contradictions with source locations and severity.",
    "8. Run bootstrap stability: resample evidence snippets and recompute composite scores for each candidate bootstrap_resamples times; compute mean, std, and confidence intervals.",
    "9. Rescore candidates: recompute composite_score using validated link support, independence penalty, contradiction penalty, and stability bonus.",
    "10. Classify each candidate as pass, flag, or fail based on thresholds and strict_mode.",
    "11. Produce human-readable validation notes for each candidate explaining why it passed, was flagged, or failed."
  ],
  "scoring_and_rules": {
    "composite_score_formula": "composite = 0.5*reliability + 0.3*novelty + 0.2*plausibility; apply independence_penalty and contradiction_penalty; add stability_bonus",
    "independence_penalty": "if independent_groups < require_independence_groups then subtract 0.15 per missing group down to -0.45",
    "contradiction_penalty": "if contradiction severity = minor subtract 0.1; moderate subtract 0.25; major subtract 0.5",
    "stability_bonus": "if bootstrap std < 0.05 add 0.05; if std < 0.02 add 0.1",
    "pass_threshold": 0.6,
    "flag_threshold": 0.4
  },
  "expected_outputs": {
    "files": [
      {"name":"validation_report.json","format":"JSON","description":"Summary statistics, runtime config, counts of pass/flag/fail, bootstrap summaries"},
      {"name":"stitched_candidates_validated.csv","format":"CSV","description":"Candidate_id,type,composite_score_before,composite_score_after,validation_status,validation_notes"},
      {"name":"stitched_candidates_rescored.csv","format":"CSV","description":"Full rescored candidate list with component scores and penalties applied"},
      {"name":"validation_log.txt","format":"TXT","description":"Step-by-step log of validation actions and any errors encountered"},
      {"name":"config_validation_snapshot.json","format":"JSON","description":"Resolved validator runtime parameters and model choices"},
      {"name":"contradictions_summary.csv","format":"CSV","description":"Candidate_id,conflicting_snippet,source_paper,severity,location"},
      {"name":"bootstrap_stats.csv","format":"CSV","description":"Candidate_id,mean_score,std_score,ci_lower,ci_upper"}
    ],
    "delivery_instructions":"Return files as base64 attachments or as downloadable files depending on the LLM interface. Ensure filenames match exactly and are placed in the graph_outputs_dir."
  },
  "quality_checks_and_sanity": [
    "All paper ids referenced in outputs must exist in REFERENCES.txt",
    "No candidate path should include the same paper twice unless explicitly allowed and documented",
    "Each stitched link must cite at least one evidence snippet; if not, mark candidate as fail",
    "If recompute_embeddings fails, fallback to TF-IDF and record fallback in config_validation_snapshot.json"
  ],
  "notes_for_executor": [
    "Treat GRAPH outputs as inputs only; do not assume GRAPH snippets are correct without re-extraction when recompute_embeddings=true",
    "Prefer conservative decisions: flag ambiguous items rather than passing them",
    "Record every fallback, model choice, and deviation in config_validation_snapshot.json",
    "If strict_mode=true, escalate flagged items to fail unless human override is provided",
    "Produce clear, actionable validation_notes for each candidate to guide human reviewers"
  ],
  "privacy_and_safety": {
    "do_not_fetch":"paywalled full texts unless user explicitly allows",
    "data_retention":"Return only derived artifacts; do not retain or store raw PDFs beyond runtime unless user permits"
  }
}

How to use VALIDATOR in practice

Where to place it

Upload the VALIDATOR block and the GRAPH outputs into the same executor LLM session or local validator tool. Point graph_outputs_dir at the folder containing graph.gexf, stitched_candidates.csv, evidence_maps.json, and config_used.json.

Recommended runtime choices for trustworthy validation

validation_mode: thorough for any candidate you plan to act on.
recompute_embeddings: true to force independent evidence scoring.
embedding_model_override: choose a different model family than GRAPH used if possible.
similarity_threshold_pass: 0.8 for strong support; lower only for exploratory triage.
require_independence_groups: 2 to ensure multi‑group support.
bootstrap_resamples: 100 for stability estimates.
strict_mode: true if you want conservative pass/fail decisions.

Example validation workflow

Run GRAPH and collect outputs.
Run VALIDATOR with recompute_embeddings=true and strict_mode=false for initial pass.
Review stitched_candidates_validated.csv and human‑inspect top 10 and any pass items.
For top 1–5 candidates, re-run VALIDATOR with strict_mode=true and a different embedding model or run a local Python validator for independent confirmation.
Archive config_used.json and config_validation_snapshot.json with your project records.

Independence note

Running VALIDATOR on the same LLM that produced GRAPH is acceptable for rapid triage if you force independent recomputation and use conservative thresholds, but for high‑stakes decisions obtain an independent validation pass (different model or local tooling) before committing resources.

If you want, I can now produce a compact one‑line validator prompt you can paste to an executor LLM to run VALIDATOR on a given graph_outputs_dir, or generate a local Python validator checklist you can run offline. Which would you like?

BadProtein

Discussion about this post

Ready for more?