Why Specialized Agents Beat General Ones (And Why Claude Is Copying Rush)

The problem with asking one person to do everything is that they're never quite good enough at any one thing.

This obvious truth about human work has somehow been inverted in the AI space. We've spent the last two years building increasingly large language models, assuming that scale and capability would solve the problem of complex task execution. Just make the model bigger, we thought. Give it more tokens, better reasoning, shinier weights. Eventually, it will be good enough at everything.

The results have been disappointing in a specific way: general-purpose agents work great for simple tasks and fail mysteriously on complex ones.

The Math of Failure

Let's ground this in something concrete. Suppose you're building an AI agent to handle a complex workflow—say, research a competitor, draft an email, analyze the results, and schedule a follow-up. Four tasks, call it. If each step is 95% accurate (which is optimistic for real work), you'd expect your overall success rate to be 0.95^4 = 0.814. About 81% reliability.

Now scale that up. A twenty-step workflow—which isn't unusual in enterprise environments—compounds to 0.95^20 = 0.358. You're down to 36% success rate. That's not good enough for anything that matters.

The usual response to this is to make the model better. And yes, a 97% accurate model would give you 0.97^20 = 0.544—54% success. Still not acceptable. You need 99% accuracy per step to get to 82% end-to-end reliability on a twenty-step workflow. That's a very high bar.

General-purpose agents hit a wall here. No amount of scaling gets you out of this trap because the problem isn't the model's capability on individual tasks. The problem is what happens when you force one mind to track multiple contexts simultaneously.

Context Poisoning

A research agent needs to know about competitor pricing, market positioning, technology architecture, funding history. It builds up a working memory of facts, connections, uncertainty. That's context.

An email-writing agent needs to know the recipient's role, relationship history, the specific goal of the email, tone preferences. That's a different context.

When you combine them in a single agent, something gets lost. The research context pollutes the email context. The email agent has facts it doesn't need and false confidence about what matters. The research agent's careful uncertainty about technical details becomes noise when drafting a message.

This isn't theoretical. It's observable in how general-purpose agents degrade on complex tasks. They start hallucinating because they're tracking too many threads. They lose sight of what actually matters because the working memory is crowded. They make decisions based on salient information rather than relevant information.

Specialized agents don't have this problem. A research agent that only does research has clean context. An email agent that only drafts emails isn't distracted by competitor analysis. They each maintain epistemic clarity about what they know and what they're uncertain about.

Why This Matters Now

Anthropic's Claude has apparently realized this. Recent reports describe Claude's next-generation model shifting toward what they're calling an "Agent Constellation"—a swarm of specialized sub-agents that delegate to each other rather than a single monolithic model handling everything.

This is not a minor pivot. This is Anthropic essentially saying: the scaling approach was wrong. We need to rethink the architecture.

It's worth noting that Rush built this way from day one. Not because of prescience, but because the constraint revealed the solution. When you build tools for specific workflows—content creation, research, outreach—you don't try to make one agent handle all three. You build Rabbit Hole to crawl competitor websites. You build Email Ninja to draft messages. You build Content Writer to turn source material into platform-native variants. Each agent has one job. Each agent has clean context.

These agents talk to each other. They pass information. But they don't lose clarity about what they're doing because they're not trying to do everything.

The Compounding Advantage

Here's where it gets interesting: specialization compounds in the opposite direction.

A specialized research agent can go deeper. It can develop sophisticated techniques for identifying what matters versus what's noise. It can build its own evaluation criteria without worrying about whether those criteria make sense for email drafting. It can be opinionated about sources and methodology because its opinion has only one downstream consumer.

An email agent can optimize for the specific patterns that make messages effective: open rates, reply rates, persuasion dynamics. It doesn't have to hedge its bets on general-purpose writing quality. It can be maximalist about voice, structure, and call-to-action because it's not also trying to write research reports and strategy documents.

The result is that the specialized agents don't just avoid the compounding failure problem. They actively avoid context poisoning by segregating concerns. And because they're segregated, each one can be relentlessly optimized for its domain.

When you string together five 90%-reliable specialized agents, each specifically tuned for its domain, you get better end-to-end results than a single general-purpose agent at 99% accuracy. And you get there faster, because you're not fighting the coordination problem of forcing one system to hold five different contexts in tension.

What Users Should Expect

The implication of this shift is worth dwelling on: the future of reliable AI work isn't better general-purpose models. It's better specialized teams.

This changes what you should expect from your AI tools. Not "Is Claude smarter than GPT-4?" but "Can this system maintain clear context boundaries while delegating between specialized functions?" Not "How good is this model at everything?" but "How good is this model at the specific thing it's supposed to do, given that it doesn't have to do anything else?"

General-purpose agents are how you build cool demos. Specialized agent teams are how you build reliable products.

Once you see this, you can't unsee it. You start noticing which problems require specialized attention and which ones traditional general agents handle fine. You start recognizing context poisoning when you see it—that specific flavor of AI hallucination that comes from too many threads tangling together. You start asking why you'd want one tool that's mediocre at ten things when you could have five tools that are exceptional at their specific domain.

The market seems to be reaching this conclusion. Anthropic is building agent constellations. OpenAI is quietly investing in agentic frameworks. Everyone is moving the same direction.

The question isn't whether specialized agents are the future. The question is whether you've been building or buying tools based on the old assumption that bigger and more general is better.

It's probably time to rethink that.

Ready to see how specialized agents work in practice? Try Rush and experience the difference of agents that each do one thing exceptionally well.