Source: https://x.com/MilksandMatcha/status/2044863551186309460 Author: Sarah Chieng (@MilksandMatcha), credited with @0xSero Published: 2026-04-16 7:40 PM Title: Single-agent AI coding is a nightmare for engineers
I pay my upfront subscription ($200/month), write what I hope is the right prompt (prompt AND context engineer), and wait. 35 minutes later, the agent is still “synthesizing,” “perusing,” “effecting,” and “germinating” (who came up with these).
By the end, I have files of bad code, a bloated context window, and I’m counting the remaining tokens on my left hand.
Okay, I grab an apple, compact, type some heavy handed verbal abuse, re-explain everything from scratch, and pray the next attempt gets further than the last one… only to be disappointed by the same result.
By now, the spark and joys of AI coding are long dead.
Stop being a one-shot Sloperator
This is the single-agent ceiling. Every developer building with AI agents hits it the moment their project graduates from a 3D HTML snake game to anything more practical. This happens for two reasons:
- we expect too much from a single agent
- we do not break problems into simple enough, verifiable tasks
And while this is when most people will sell you (a) a useless course on prompt engineering, (b) another SaaS tool that manages your context, (c) or ask why you haven’t tried out the new model that came out seconds ago, we won’t be doing that today.
Instead, we’re going to walk you through what actually works: running a proper back of house. Multi-agent workflows.
Welcome to the back of house
There are a few reasons why multi-agent workflows have become much more practical in recent weeks: underlying models have gotten better, and popular AI coding agents have made multi-agent orchestration easier to set up. In the last quarter, OpenAI rolled out deeper orchestration in Codex workflows, while Anthropic continued expanding Claude Code and the MCP ecosystem.
The biggest unlock, though, is speed. One of OpenAI’s latest models, Codex Spark (powered by @cerebras) runs at roughly 1,200 tokens/second, which makes it practical to introduce parallel and verification steps that would otherwise be too time-costly to run.
For an example task using Codex and the Figma MCP to copy a website into Figma, the single agent workflow had a 36.5 min/run average with an average of 12 interventions (and 100% failure rate) while the multi-agent workflow leveraging Codex Spark had a 5.2 minute run, 2 manual interventions, and success on the first try.
What is a multi-agent workflow?
Multi-agent workflows fix the single-agent ceiling at the architecture level. Instead of one cook doing everything, you have a head chef who takes the order, breaks it into scoped, verifiable tickets, and hands each one to a line cook to execute.
The Head Chef (Orchestrator): The Head Chef’s job is to take the order from the human, break it into a working list of tickets, then call line cooks to each go out and complete one smaller, scoped job. The orchestrator is responsible for planning, coordination, and task decomposition. Its only tool is delegate_task, and it only sees high-level goals plus summaries of subagent outputs.
The Line Cooks (Subagents): The Line Cook’s job is to take the ticket (task assignment) given by the Head Chef and get the job done, no questions asked. Each line cook gets its own fresh station (context window), does its work, returns the plate, and clocks out. Subagents can read, write, use MCPs, and any other tools needed. They only see their assigned prompt and a fresh context window (no prior history).
The trick to keeping things orderly: the line cook doesn’t get the full order history. It also doesn’t get your 15,000-token master plan document. It gets the minimum viable context to cook one specific dish.
Three immediate wins from running a back of house
1. Tokens: your effective context window goes from ~200K to 25M+
- The human talks exclusively to the orchestrator.
- The orchestrator is stripped of all tools other than delegate_task.
- If the orchestrator wants to take an action, it spawns a sub-agent via delegate_task.
- Each sub-agent has its own fresh context window, starting only with a prompt.
- Sub-agents can read, write, use MCPs, and any other tools.
- Sub-agents return a summary of their work back to the Head Chef.
This means the orchestrator never has to read files, write files, or see tool-call results directly, effectively extending its context window to as many sub-agents as it can spawn.
2. Control: you can enforce sequential workflows at each turn of the agentic loop
The orchestrator follows a script, spawning one sub-agent per phase:
- Sub-agent A breaks the order into a contract with subtasks and criteria.
- Sub-agent B explores the next subtask.
- Sub-agent C tests the code generated in the prior subtask. If tests pass the validation criteria, move on. Otherwise respawn the coding line cook to fix identified issues.
- Sub-agent D documents the subtask and updates the scope checklist.
- If any subtasks remain, continue from step 2. Otherwise, service is done.
In internal trials, this sequential loop reduced manual interventions by 84.3% compared to single-agent runs on the same brief.
3. Speed: you can run well-defined tasks in parallel
Good fits:
- generating logos, images, mascots, assets, mockups, designs, or tests
- exploring a massive codebase orders of magnitude faster
- building multiple pages quickly where each subagent works on separate parts of a codebase and doesn’t overwrite each other
Running five parallel mascot generations took roughly one minute versus five minutes sequentially, about a 5x speedup on taste-driven exploration tasks.
5 Patterns That Actually Work
Pattern 1: The Prep Line
Run multiple independent generators on the same brief, then curate outputs manually. Best for design exploration, code variations, or test generation.
Pattern 2: The Dinner Rush
Run distinct scoped tasks simultaneously toward one shared goal. Best when tasks are deeply specified, verifiable, dependency-aware, and do not touch the same files.
Pattern 3: Courses in Sequence
Break a project into ordered waves where each wave depends on the previous one, but tasks within a wave can run in parallel.
Pattern 4: The Prep-to-Plate Assembly
Run a sequential handoff pipeline where each line cook does one bounded step, validates it, and hands the workpiece to the next station. State should live in files and task queues rather than conversation history.
Pattern 5: Here comes Gordon Ramsay
Separate builders from verifiers. One agent writes; independent reviewer and tester agents validate in parallel. This verification layer should sit on top of any other pattern.
Main takeaway
The article’s core claim is that the solo one-shot coding-agent workflow is a dead end for substantial engineering work. Better results come from explicit orchestration, scoped subagents, verification stages, and parallel execution when the dependency graph allows it.
Notes:
- Thanks credited in the article: Zhenwei Gao, James Wang, @brickywhat, and illustrator @halleychangg.
- Engagement seen at capture time: 14 replies, 19 reposts, 276 likes, 789 bookmarks, ~71.4K views.