What I Hand Off

A few weeks ago I added a --title flag to this blog’s new:post scaffolder so an agent could create a post without sitting through the interactive prompts. The flag was an easy hand-off because two things were true. The stakes were low enough that a wrong scaffold cost me nothing to throw away. The domain was one I know cold, so if the model produced something off, I’d see it in the diff before I ran anything.

Most of my hard calls about where to trust a model come down to those two axes. Four combinations, four different answers.

Low stakes in my domain. The model gets a free hand. Repro scripts for bugs I’m chasing, log queries I’ll run once, internal helpers nobody else builds against. I rarely read the diff carefully. If something breaks, I throw it away.

High stakes in my domain. Verification is the work. The model writes refactors that touch dozens of files, migrations across services, fuzz tests scoped to a single API surface. I read every line before it ships, because the bar is higher than for code I can throw away. Calls that depend on context the repo doesn’t have, like architectural framing or the argument in a PR thread, stay mine alone.

Low stakes outside my domain. The cheapest place to learn. A throwaway PoC in a stack I haven’t shipped in, or a script that has to call into a service my team doesn’t own. I let the model do most of the work and read its output the way I’d read a tutorial. If I get something wrong, the cost is doing the exercise twice.¹

High stakes outside my domain. Where I slow down. A database migration in an engine I haven’t operated, or auth flow in a framework I haven’t shipped with. I read up and pull in another engineer who knows the area. The model’s output is a starting point. The model is most confident exactly where I’m least equipped to push back on it, and the most expensive mistakes I’ve watched engineers make have lived in this quadrant.

#How this might be wrong

The rule depends on me being honest about which quadrant I’m in. The trap is the engineer who thinks they’re a subject matter expert but isn’t. The model’s output will look correct, and the work will ship before anyone notices the gap that someone with real expertise would have flagged in seconds.

The same trap moves to the engineers around me. Most of my work lives in someone else’s diff, and the calibration applies to their decisions as much as to mine. A strong engineer leaning on a model in an area outside their depth doesn’t have the verification reflex they have inside it. Reading their PR means reading their quadrant first, then the code.

Trust but verify is the principle. The flag on new:post was the easy version. Most of the verification belongs to somebody else, and my call is whether they’re calibrated for it.

Three times if I’m being honest. ↩

#How this might be wrong

#Footnotes