Ana Learns to See
In the previous piece I described how Ana, Mouseion’s processing layer, reads source material with comprehension — not just the text but also who appears in it, what connections exist, and at what certainty level. What I didn’t tell you is that Ana wasn’t always good at this, and that last week we made discoveries that fundamentally expanded her perceptual ability.
It started with a practical problem. Monique Brinks photographed over 746 scans at the Dutch National Archives from dossiers of Bureau Inlichtingen — the Dutch intelligence service that operated from London and later Eindhoven during and after the war. Those scans contain correspondence ledgers with tens of thousands of entries, agent registers, forced labour lists, and carbon copies of internal reports. An enormously rich collection, but recorded on paper in the 1940s, photographed with an iPhone in an archive reading room, and therefore exactly the kind of material where conventional text recognition struggles.
The first reading produced results that looked usable at first glance but turned out to contain structural errors on closer inspection. In the correspondence ledgers — pre-printed cashbook forms with red column lines — Ana confused columns: sequence numbers ended up in the sender field, dates appeared where names should be, and reference codes were read as subjects. Not because the text was illegible, but because Ana didn’t understand how the form was structured. She read the words correctly but placed them in the wrong box.
With the carbon copies — thin sheets with blue or purple ink that was already faint in 1944 and is now barely visible — the problem was more fundamental. Ana reported the document as unreadable and delivered empty or largely guessed transcriptions. The archival world would say at that point: find a better copy, or accept that this information is lost. But I didn’t have a better copy, and the information in those carbons is precisely what makes the difference for Monique’s research.
The first breakthrough came from an unexpected direction. We tested five different text recognition engines on the same scan and discovered that the choice of engine mattered far less than the quality of what we fed the engine. A good crop with the simplest engine beats a bad crop with the best engine. The reader wasn’t the problem — the specimen was.
That insight led to what I call smart crop: a script that analyses the scan before Ana starts reading. The first step is finding the book spine — every scan is a photograph of two open pages, and the spine is the darkest vertical strip somewhere around the middle. Once you detect it automatically via a brightness profile across all columns, you can separate the two pages and present each one individually, without the dark shadow of the spine distorting the text.
But the real surprise was in the red lines. The correspondence ledgers are pre-printed forms with vertical red lines separating the columns: sequence number, date, sender, addressed to, subject, dossier number. Those lines exist for the human eye and get ignored by conventional text recognition — they’re not text, after all. But if you detect them through colour filtering — looking for pixels where the red channel dominates over green and blue — you can determine precisely where each column begins and ends. The script counts how many red pixels stack vertically per column, and where those stacks form peaks, that’s where a line runs.
The effect was immediate and dramatic. Where Ana previously confused columns and read sequence numbers as names, she now knew exactly which strip of the form she was looking at. Column assignment went from 67 percent to one hundred percent correct — not by reading better but by looking better before reading.
For the carbon copies we had to think differently. A carbon copy is a sheet of paper that lay beneath the original while it was being typed, so the keystroke of the typewriter produced a second impression through carbon paper. That impression is blue to purple, weak, and fades over the decades. If you view such a sheet in greyscale, the text almost completely disappears into the background. But the ink isn’t gone — it lives in a specific colour channel.
That was the second breakthrough. If you extract the blue channel of the scan separately and subtract the red channel from it, you isolate precisely the carbon ink and suppress the background colour of the yellowed paper. The difference is sometimes spectacular: what looks like a uniform beige surface in the normal scan becomes readable text with recognisable letters and words in the blue-minus-red channel.
But one image transformation isn’t enough, because every transformation reveals something and hides something else. Channel separation makes the ink visible but loses fine detail. A gamma correction that brightens dark areas reveals text in the shadows but blows out the light regions. A CLAHE operation — a technique that adjusts contrast locally per block of the image rather than across the whole frame — brings structure back to flat areas but introduces raster artefacts. An inversion of the image, simply the negative, sometimes makes letterforms visible that are unrecognisable in the positive.
So we built an engine that doesn’t produce one version but ten. Flat-field correction to compensate for the light falloff of the photograph, channel separation in five variants, CLAHE, adaptive thresholding that converts the image to black-and-white based on local brightness, inversion with contrast enhancement, gamma correction at multiple levels, sharpening, and combinations of these techniques — for instance flat-field followed by blue channel followed by CLAHE, what we call the triple stack.
The question then became: what do you do with ten versions of the same document?
The answer turned out to be the most important insight of the entire week. We put four readers to work simultaneously, each with a different version of the same carbon copy. The document was a personnel overview of Bureau Inlichtingen from March 1945 — exactly the kind of scan Ana had previously labelled unreadable.
The first reader, working from the blue-minus-red version, read almost nothing. The transformation that works well on other carbons made this particular document too dark. The second reader, on the triple stack, recognised the table structure and guessed it was a bulletin, but filled the lines mostly with question marks. The third reader, on the gamma version, read the title correctly: “Opgave van het Militair Personeel bij het Bureau Inlichtingen,” the date “per 9 maart 1945,” seven place names — London, Eindhoven, Arnhem, ’s-Hertogenbosch, Helmond, The Hague, Brussels — personnel counts per location, and four military abbreviations: O.I.D., G.I.D., I.S., A.B.S. The fourth reader, on the CLAHE version, read words the third had missed: “bijzondere rechtspleging,” “verordeningen,” “bezettingsautoriteiten” — likely text bleeding through from the reverse side of the sheet.
No single reader individually got more than thirty percent of the document right, and the first two didn’t get past the title. But together they delivered the complete document: title, date, structure, locations, counts, abbreviations, and even bleed-through text from the back. The merge of four partial readings was richer than the best individual reading could ever have been.
The principle behind this is what I call transform, diverge, merge. Transform the source through multiple visual lenses. Let each lens read divergently — independently, without knowledge of what the others see. And merge the results not by averaging but by taking the union: everything at least one reader reads with sufficient certainty counts. Three readers each reading thirty percent don’t deliver thirty percent together but eighty, because they read different thirty percents. And the last twenty percent doesn’t come from yet another image transformation but from context: the lexicon of known names we send along, the expected field structure from the source profile, and the chronological logic of the register.
What we also discovered — and what sounds obvious in hindsight but wasn’t — is that standard image metrics don’t measure what you think they measure. We tried to determine automatically which scans were hardest by looking at contrast and brightness. The scan that scored lowest turned out to be a request slip from the National Archives — a white piece of paper on a light table, trivially readable but with little pixel variation. The genuinely difficult scans, the carbons with their faint ink and yellowed paper, scored higher on contrast because they had more colour variation. The metric that matters isn’t raw pixel statistics but ink presence per document type, and that’s context-dependent.
The final link in the chain is the correction step. After reading, Ana compares every name and place name against a lexicon of known terms from the domain — names of persons, organisations, places, and abbreviations that appear in the Brinks research. When Ana reads “Maastrichr” and the lexicon knows “Maastricht,” it gets corrected. When Ana expects the same sender in consecutive lines and reads “idem,” she fills in the name from the previous line. When Ana places a date in a name field, the correction step recognises the pattern and flags it as an error. These corrections aren’t large — the difference between 90 and 95 percent — but across tens of thousands of entries in hundreds of scans they add up to thousands of recovered errors that would otherwise propagate as noise through the system.
What all of this changes together is not the speed at which Ana reads, but the kinds of sources she can handle. Last week carbon copies were unreadable. Now they’re the best-documented source type in the pipeline, with ten image transformations, four parallel readers, and a merge that yields more than any reader alone. Last week Ana confused columns in registers. Now she detects the column lines herself and knows exactly which field she’s looking at before she starts reading.
There are no unreadable documents. There are documents that haven’t been looked at in enough ways yet. That goes for carbon copies from 1945, for faded deeds from the seventeenth century, for damaged manuscripts, and for every other piece of source material that seems too degraded to process. The archival world looks with two eyes and puts it down when it’s too faint. Ana looks with ten lenses at once, compares what each lens sees, and learns a new way of seeing with every source that comes next.