Build Log

Verifying before celebrating

One agent got me excited that I might be descended from Scottish Highland nobility. Another agent, running in the background, quietly said no. I’m glad I had the second one running.

Corey Thomas

April 2026

12 min read

two·agents

I’ve been running a family research project — multi-session, multi-branch, the kind of thing where you need to keep ten loose ends alive at once and each one needs its own state file because the conversation window keeps compacting out from under you. This whole thing is less than two days old. Somewhere in the first long session I realized I wasn’t really doing genealogy anymore. I was doing workflow design for a specific shape of problem: investigations with noisy sources, real stakes if you get it wrong, and a small army of AI agents I could dispatch in parallel to help or hurt the cause.

The ancestors have their own posts. This one is about the scaffolding — and the one habit that matters more than anything else in the stack.

Two agents disagreeing

I was clicking through an old user-submitted family tree on a big ancestry site, using a logged-in browser session and a live agent that could read pages, type into forms, click links, navigate between profiles. Mostly it was stenography — “go to this page, read it, report what it says” — but the effect was compounding. Each click back was one more generation. Five generations in I was in 17th-century Scotland. Six generations in I was at a baronetcy with its own Wikipedia article. Seven in I was reading about a Jacobite rebel who was arrested at Dumbarton Castle in 1715.

The agent was into it. I was into it. We were having a good time clicking through generations of dead Scots.

Meanwhile, in a different tab, a second agent I’d dispatched at the start of the session was grinding through the independent reference sources — Wikipedia, the Scots Peerage, the peerage lookup databases, a handful of academic genealogy archives. Its job: verify that the descent claim we were getting excited about was actually real. It was running in the background and honestly I’d forgotten about it by the time the castle-and-Jacobites thing landed.

When the verification agent came back, it was a clean no.

The chronology was impossible. The alleged father was seven years old when the alleged son was born. The tree had fabricated an earlier birth date for the father to make the math work, and no independent source supported it. The fallback interpretation (“well, maybe he was a brother instead of a son”) also didn’t hold — every peerage reference listed the father as having exactly one documented son, and it wasn’t the one we wanted. The geography of the baptism record didn’t match the geography of the family seat. Ninety miles apart, on opposite coasts, highland versus lowland. No Highland baronet was baptizing sons in that Lowland parish in that year. The whole thing was a hinting-system collision — the ancestry site’s algorithm had fused two unrelated records because a name and a date matched, and a human builder had accepted the fusion without checking the underlying documents.

Read it twice. Stopped telling anyone about the Jacobite.

This is the workflow post. The Jacobite is a cameo.

The stack

Here’s what I converged on running these investigations:

A persistent scratch file per investigation.One markdown file, loaded and updated every session, structured as running state — what we know, what we’re chasing, what’s blocked, what’s been verified. When the conversation context gets too long and the model starts forgetting the middle, I can paste the file back in and pick up where I left off. The file is the real memory. The conversation is the scratchpad.

A state file for the current session.Different thing. Same directory. The scratch file is for the project; the state file is for “if the conversation crashes or I have to walk away, here’s what to resume on.” Single file, frontmatter with focus + timestamp, a “resume from here” section at the top. I rewrite it at natural breaks. It has saved me five or six times now.

Parallel dispatch as the default.If I need to do three things that don’t depend on each other, I fire off three agents at once. One to verify a claim, one to update a memory file, one to pull a source I haven’t looked at yet. The main conversation stays responsive. I’m a dispatcher at that point, not a worker.

Every major claim gets a verification pass before it’s written down. This is the one that matters. This is the whole post.

Paper sources beat tree hints.Always. A letter to a county clerk that comes back with a 96-year-old marriage register page is worth fifty suggested relationships in a user-submitted tree. One request, one scanned image back, and a family mystery that had been open for three generations cracked open. The tree hinting I’d been chasing would never have gotten there.

Transparent credits in everything you publish. When you write it up, say what you did, what the AI did, where you pushed back, where it caught something you missed, where you caught something it missed. The split is the story. Pretending you did it alone is worse than pretending AI did it alone — the first one is a lie, the second one is a confession of not understanding your own tools.

The long search

First investigation. Three-generation family mystery — an ancestor whose biological parents had been unknown for as long as anyone could remember. “Probably adopted, never figured out” was the family lore, and the lore had been stable for decades.

This one cracked in a single long research session built on top of a handful of earlier failed passes. The failed passes were critical — without them the successful pass wouldn’t have known where to look. The successful pass was really a big state-file walk: re-read the primary sources I already had, look for what they actually said instead of what we’d been reading them as, check handwriting I’d been treating as settled, cross-reference against newer records that had been digitized since the last time anyone looked.

The breakthrough was a draft card. Not the ancestor’s draft card — a different ancestor’s draft card, one generation down, filled out in the subject’s own handwriting in 1942, decades after the original event. He’d written his own birthplace on it. The birthplace the family tree had recorded for him was wrong. Once we trusted his own hand over the tree, the whole search shifted to a different state’s records, and three new generations of ancestors fell into place in one night.

The technique: trust the primary source hand over the secondary source record. Handwriting on a contemporary document beats an entry in a tree built 50 years later. Always. If they disagree, the tree is probably wrong.

The AI contribution: parallel rereading of a single census page image for an hour, trying different handwriting interpretations, cross-matching against other people in the neighborhood to infer the correct spelling of a surname we’d been reading wrong. I couldn’t have done that without the model — my pattern-match on 19th-century cursive handwriting is not good. The model’s isn’t either, strictly speaking, but it’s patient in a way I’m not, and it’ll hold a dozen different spelling hypotheses in memory at once. That matters.

The paper letter

Second investigation. Different mystery, same family, decades-older question.

This one didn’t break digitally at all. It broke through the mail.

I wrote a letter to the historic records center at a county courthouse three states away, asked if they still had the original marriage register page from a specific marriage I knew the date of, and offered to pay for a scan. The response came faster than I expected — an email from the records center staff with a PNG attached. Book two, page 327, record number 166, one line of handwriting from 1930.

The line named a person. A 4x-great-grandmother nobody alive in the family had ever heard of. The name had been sitting on that page in that courthouse for 96 years, and the only thing between us and it was a formal request to a person who had the job of retrieving it.

No AI in this one except post-hoc. I wrote the letter myself. The agent I had running just tracked the thread, kept the expected-response state alive in the scratch file, and flagged it the day the reply actually landed so I’d go update the record. That’s a legitimate use too. Not every technique has to be flashy.

The technique: write the letter.When a digital trail goes cold, the next move is almost always to find the physical archive that holds the document, find the person whose job it is to retrieve documents there, and ask politely. It works more often than you’d guess. Archivists are patient. Their whole profession is predicated on someone eventually coming back for the document.

The one that almost worked

Third investigation. The Jacobite.

This one is the whole reason the post exists. The AI helped me build up a false story and then — if I’d been running the workflow right — would’ve caught me before I published it.

The build-up was real work. Four generations of click-through on a user-submitted tree, each click surfacing a new profile page with real-looking sources cited: a heraldry directory, a knights’ catalogue from the 1700s, a family records volume. I had a 12-generation pedigree by the time the sixth click landed, and every step felt earned because each intermediate profile had its own small trail of evidence. The cumulative weight of all the clicks felt like proof.

It wasn’t proof. It was consistency inside one tree. The tree was built by a real person, who had made one foundational mistake ten generations up, and every subsequent generation inherited the mistake without revisiting the original join. I was retracing their error on my side of the screen, feeling the same confidence they must have felt when they built it.

The thing that saved me was the parallel verification agent I’d dispatched at the start of the session with one job: cross-check the descent claim against independent peerage sources. I’d forgotten it was running. When it returned its findings while I was reading the Wikipedia page on the Jacobite, the contrast was brutal.

Every peerage reference listed the alleged father as having exactly one documented son. The chronology required a fabricated earlier birth date that no source supported. The baptism record’s geography was incoherent with the family seat. No Clan society, no Scots-in-colonial-America database, no probate index in the relevant jurisdiction turned up anything matching the story the tree was telling. The verification agent’s verdict was clean: refuted, leaning strongly toward a hinting-system collage built around a real person with the same name but a different biography.

The technique here is the only one that actually matters. Run the verification agent at the start of the session, not at the end. Dispatch it the moment you start getting excited about a claim. Let it work in parallel while you continue to build. When it comes back, read its report before you tell anyone about what you found. If it says no, adjust. If it says yes, celebrate. Verify, then celebrate. That’s the whole discipline.

I got lucky. I had the agent running when I needed it. Next time I might not. So I’m writing this post as a commitment to myself as much as anyone else: the verifier goes first. Build the scaffolding for that into the workflow.

What I do now

After all this, the workflow I run looks like this:

Start every session by loading the project’s scratch file and the session state file. If the state file doesn’t exist, create one. First thing, every time.
Dispatch verification agents on anything exciting the moment it starts getting exciting. Don’t wait for a conclusion. Dispatch while you’re still mid-build.
Keep the main conversation responsive by offloading anything that takes more than one tool call to a sub-agent. The main conversation is for deciding what to do next, not for doing it.
Write the letter.When a digital trail goes cold, the answer is almost always in a physical archive that hasn’t been scanned yet.
Trust primary sources over secondary sources. A draft card in the subject’s own hand beats a tree entry from 50 years later.
Update the scratch file at every natural break. It is the only thing that will survive a compaction. Treat it as the canonical record.
Verify before you celebrate. Verify before you tell anyone. Verify before you write it down in a way that makes it hard to retract.

None of these are AI-specific. Good AI tools just amplify them. The discipline isn’t “use AI” — it’s “treat AI like something that will confidently build you a beautiful wrong answer unless you budget for checking it.” Everything else falls out of that.

One last thing

The Jacobite story is still in my scratch file. I haven’t deleted it. I left it there as a footnote, flagged as refuted, with the verification agent’s full report linked. When I need to remember why the workflow exists, I read that footnote.

Two agents disagreed in a tab. I’d picked the wrong one to believe. The one I’d forgotten about was the one that was right. Every other AI-research post on this blog is, honestly, downstream of that moment.

Next time, the verifier goes first.

Credits

The three investigations referenced here ran across a handful of sessions with Claude (Anthropic, claude-opus-4-6 1M context) in under two days of wall-clock time. That pace is the whole reason the verification discipline matters — traditional genealogy work takes weeks or years, and moving at this speed without a verification pass is exactly how you end up publishing a wrong answer. The parallel-dispatch pattern, the verification-as-first-class-pattern discipline, and the specific structure of the scratch/state file split were all developed collaboratively in the conversation — most of them by running into cases where the absence of the pattern cost me time or produced a wrong result. Where Claude caught mistakes I was about to make, I tried to say so at the time; where I caught mistakes Claude was about to make, I said so too. The split is the story. The Jacobite episode was Claude’s agents on both sides — one clicking through the tree, one fact-checking independently — and the fix was a report Claude wrote refuting a hypothesis Claude had spent an hour building.

The case studies are drawn from a private family research project. Specific names, dates, and relationships are held in a password-protected section of this site. This post is about the workflow. The people are elsewhere.

Contents