April 20, 2026 — Lees in het Nederlands
Existing extraction systems find information in text and give it a confidence score: 0.87. That number means nothing. Is it 87% because the OCR is bad? Because the name is ambiguous? Because the pattern matched weakly?
This system does something different. It extracts information and simultaneously explains how certain it is and why.
Historical archives are scanned with imperfect OCR. They're written in three languages. They contain 11 million words across 28,461 text chunks. No one has time to read them all. So you build a system to find names, dates, locations, and actions automatically.
But if you can't say how reliable each finding is, the system is unusable for scholarship. A historian needs to know: is this date certain? Is this name possibly an OCR error? Is this attribution based on the text itself or on an inference?
Every extraction is classified on a certainty layer from the 12-layer Synaptic Architecture model:
| Layer | Meaning | Example | Certainty |
|---|---|---|---|
| L2 | It says so literally | "6 March 1942" | 100% |
| L3 | Source convention | "7.11.44" from MI5 (always day.month) | 99% |
| L6 | Pattern recognition | "Nov 41", "in March 1942" | 95-98% |
| L7 | Vague indication | "early March", "Spring 1942" | 85-90% |
The layer doesn't just say how certain — it says why. And it says how to improve: an L7 date can become L2 when another source confirms it with an exact date. Cross-source confirmation promotes the certainty layer.
Every pattern improvement invalidates the table. Solution: 43-second full recomputation. No versioning needed at this scale.
56 pattern types cover six dimensions. A seventh dimension (irony, causal connections) is invisible. Solution: add patterns when a question requires something new.
86% of dates are not in the same sentence as the action they date. Solution: chunk-wide date coupling with distance measurement. Brought date coverage from 5% to 60%.
If all sources omit the same person, we can't see them either. Solution: add sources from different perspectives. Russian archives fill Western blind spots.
0.8% of canonical names are ambiguous ("Burgers" = person or citizens?). Solution: context check. Title or rank before the word = person. Article before it = common word.
"Lauwers, arrested, March '42, The Hague" in one chunk doesn't prove they belong together. Solution layered: distance measurement (90% → 95%), verb direction (95% → 97%), enumeration detection (97% → 99%), LLM batch for 0.08% remainder (99.92%).
53 chunks across 4 sources (De Jong, MI5, Parliamentary Inquiry, Mitrokhin Archive) describe the arrest of radio operator Lauwers in 1942. The system reconstructs:
Persons involved (all sources combined): Giskes, Schreieder, Ridderhof, Taconis, Kup, Bodens
Locations: The Hague, Scheveningen, Driebergen
Date: 6 March 1942 — 7 independent mentions, 2 sources, certainty layer L2
MI5 adds two persons that De Jong never mentions: Kup and Bodens. That is not an anecdote — scaled across the entire corpus, it reveals 1,174 persons in MI5 files that the 30-volume standard work omits.
The system automatically found five cases where De Jong's dates conflict with primary sources:
| Person | Action | De Jong | Other source | Difference |
|---|---|---|---|---|
| Christiansen | appointment | 19 July 1940 (L2) | Inquiry: 25 June 1940 (L2) | 24 days, both hard dates |
| Ferwerda | appointment | Sept '44 (L6) | Inquiry: 30 Aug 1944 (L2) | ~1 day |
| Sevenster | appointment | Summer '40 (L7) | Inquiry: Autumn '40 (L7) | 1-3 months, both vague |
| Sikorski | death | Nov '44 (L3) | Inquiry: March '45 (L6) | 4 months |
| Wehner | death | March '45 (L3) | MI5: Oct '44 (L6) | 5 months |
The Christiansen case is the strongest: two L2 dates (hard evidence) that clash by 24 days about the same appointment. That is a testable historical discrepancy.
| Metric | Value |
|---|---|
| Pattern types | 56 (6 domain plugins) |
| Total extractions | 132,122 |
| Dates found | 51,446 (with certainty layers) |
| Persons recognised | 96% (33,783 canonical) |
| Scan time | 43 seconds on 11.4M words |
| Resolvable without LLM | 99.92% |
| Contradictions found | 5 (after filtering 26 false positives) |
| Blind spots measured | 1,174 persons MI5 knows that De Jong omits |
Every finding is:
The system doesn't say "Lauwers was probably arrested in March." It says: "7 independent mentions from 2 sources confirm 6 March 1942, certainty layer 2, here are the 7 sentences."
Part of the Life Lens System. Built with Claude Code on 20 April 2026.