index

Jidoka for Software: Autonomation in the Age of LLMs

With coding agents, the bottleneck in software engineering is now intent: knowing what to build, specifying it precisely enough that an agent can execute it, and verifying that the result actually works. A small team that understands this shift will outproduce a large team that doesn’t.

Teams run a factory floor now

In 1896, Sakichi Toyoda built a loom that could detect when a thread broke and stop itself automatically. This was not a faster loom. It was a different kind of loom: one that separated the act of production from the act of quality judgment. A single worker could now oversee dozens of machines instead of watching one, because the machines would signal when they needed human attention. Toyoda called the principle jidoka, sometimes translated as “autonomation,” automation with a human touch.

Taiichi Ohno later made jidoka one of the two pillars of the Toyota Production System (alongside just-in-time manufacturing), and the insight underneath it is worth stating precisely: the highest-leverage thing a floor manager does is not pick up a wrench. It’s to keep flow by designing fast stops, surfacing abnormalities immediately, and fixing root causes so the line can run unattended.. The wrench work is important. But the moment teams confuse the wrench work for the job, they have misidentified where value is created.

I think software engineering is going through this transition right now, and most teams haven’t noticed.

When modern coding agents (Claude Code, Cursor, and similar tools) can produce a correct, idiomatic, test-passing implementation of a well-specified task in minutes, the act of typing code is no longer the bottleneck. It is production work. Important production work, the way Toyoda’s looms still needed to weave thread, but production work nonetheless. The bottleneck has moved upstream, to the specification of what to build and downstream, to the verification of whether it actually worked. The person who can run five agent sessions in parallel, specifying tasks clearly, reviewing output critically, and routing work around blockers, will outproduce the person who writes beautiful code one function at a time. Not because they’re a better engineer in any traditional sense, but because they’ve correctly identified where the constraint is.

Running multiple agent sessions at once is a skill. It takes practice and it’s hard. But that is the skill teams are building now, and it is what will separate a team of ten from a team of a hundred.

Each engineer owns a domain, not a layer

The traditional way to organise an engineering team is by technical specialty. Frontend engineers, backend engineers, infrastructure engineers, each defined by the layer of the stack they inhabit. This made sense when the scarce resource was deep expertise in a particular technology. If a team’s React specialist is the only person who can build the payment form, that person stays focused on React.

Agents change the equation. The agent is the specialist who knows TypeScript and Python and SQL and whatever else the task requires. It has often read the docs more recently than most humans on the team. (It has, in fact, read all the docs, which is more than can be said for most of us.) What the agent cannot do is decide what to build, why it matters, and whether the result actually serves the business need. That requires understanding the domain: the payments flow, the search experience, the integration surface, the user’s actual problem.

When boundaries between domains are blurry, ownership decays and integration risk climbs. A practical approach is to structure around domains, not layers. Each engineer owns a business domain end-to-end. They write the spec, run the agents, review the output, and ship it. The same person who decides “this webhook handler needs retry logic” also verifies that the retry logic behaves correctly in production. The feedback loop is one person wide, and that is the point.

The architecture that makes this work is boundaries. The only thing that matters between domains is the contract: the API, the types, the interface. Inside a domain, teams can refactor freely, change implementations, and let agents restructure entire modules. But the boundary between domains gets reviewed by both sides, because that’s where integration risk lives.

No cross-domain imports. No reaching into another domain’s database. These rules sound restrictive, and they are. They are also what makes it possible for ten people (and their agents) to move fast without constantly breaking each other’s work. Constraints that enable speed are not restrictions. They are infrastructure.

Platform engineers as force multipliers

In many small, high-output teams, two or three engineers may ship zero features. This is deliberate. Their job is to make everyone else dramatically faster.

Isolated environments so agents can boot and validate the app per change, CI that runs fast enough to keep agents unblocked, agent-first repository knowledge and observability tooling. These are the jigs and fixtures of a software factory floor. A factory without good tooling is just a room full of expensive machines producing inconsistent output. A factory with great tooling produces consistent output almost regardless of who’s operating the machine, which is precisely the property teams need when the “operator” is an LLM.

When an agent produces bad output, the response is never “try harder.” It is “what guardrail is missing, and how can it be enforced?” That question is the platform team’s entire job.

This is the poka-yoke principle from Toyota’s system: mistake-proofing not through diligence but through design. If a standard isn’t enforced in CI, it does not exist. Coverage thresholds, import-boundary linters, complexity limits, architecture tests: these are the automated gates that agents cannot bypass. Agents are remarkably compliant. They tend to follow explicit, enforced guardrails more consistently than humans, and they drift when guardrails are vague. They also produce confidently wrong output if no guardrails exist. The agents won’t raise the bar for themselves. The platform team raises it for everyone.

This is where the jidoka parallel is sharpest. Toyoda’s loom didn’t just weave faster. It detected its own defects and stopped. A CI pipeline designed for agents doesn’t just build faster. It detects violations of engineering standards and blocks the deploy. The human isn’t watching every thread. The human designed the machine to watch them.

Review intent. Agents review agents.

Here is the part that makes experienced engineers most uncomfortable.

Code review as most teams have practiced it for the past two decades is a particular ritual: one human reads every line of another human’s diff, leaves comments about naming and edge cases and architectural concerns, and eventually approves. It is a ritual built for a world where humans produce the code and the primary quality mechanism is another human’s careful attention. It works. I’ve spent years doing it and believing in it.

It does not scale when agents produce ten times the output.

The instinct is to say “then teams need to review ten times as carefully,” and this is the wrong response. Not because careful review is bad, but because it misidentifies where human review adds the most value. Agent review plus automation can catch many mechanical issues, including style inconsistencies, simple bugs, and convention violations, faster than humans when checks are enforceable. What it cannot catch is the subtle design flaw, the wrong abstraction, the implementation that technically works but solves the wrong problem. Those require understanding the intent behind the change, and that is a human judgment.

When agent output exceeds human review capacity, split review into two parts. Before work starts, review the spec: is this the right thing to build? Is the task well-defined enough for an agent to execute? Are the acceptance criteria clear? After work completes, verify behaviour: does the change do what the spec said? Does it integrate correctly? Did the production metrics move in the right direction? The diff in between gets agent review and CI. Human attention goes where only human attention helps.

This is not removing humans from review. It is focusing human review on the two things that actually require humans: intent and outcome. The mechanical middle, “did the code correctly implement the spec,” is increasingly automatable. Insisting that a human eyeball every line is not rigour. It is a failure to distinguish between the parts of the process that need judgment and the parts that need consistency, and consistency is what machines are for.

This should feel uncomfortable

This model asks people to give up activities that have been central to their professional identity.

Nobody writing code by hand. Shipping without line-by-line human code review. Some engineers not opening an editor for days. These feel dangerous. The instinct says this is reckless. Those instincts were built for a world where typing speed was a meaningful factor in engineering output, and that world is ending faster than instincts can update.

The discomfort is real. I feel it too. But notice what this model has more of, not less: more automated quality gates, more architectural boundary enforcement, more verification of production behaviour, more explicit specification of intent before work starts. The guardrails haven’t been removed. They’ve been moved from manual processes (which are inconsistent, tiring, and don’t scale) to automated systems (which are consistent, tireless, and scale with the number of agents teams can run).

Shigeo Shingo, who formalised much of Toyota’s production theory, described autonomation as “pre-automation”: not full autonomy, but the stage where machines handle production and signal humans when judgment is needed. He identified twenty-three stages between purely manual work and full automation, and argued that ninety percent of the benefits come from autonomation alone, well before teams reach the end of the spectrum. I think software engineering is somewhere around stage four or five of that progression. Teams are not replacing engineers. They are changing what “engineering” means, from production to judgment, from typing to directing, from reviewing lines to reviewing intent.

The goal is not that engineers do less. It is that engineers do more, reach further, and have more impact than anyone could typing alone. Ten people running a well-tooled factory, each overseeing multiple agents executing against clear specs, with automated quality gates that enforce standards the agents alone would never set, shipping verified, production-tested changes across their entire domain.

That is how teams outship competitors ten times their size. Not by typing faster. By recognising that typing was never the bottleneck.