Back to blog

The Siri Trap: Why Single-Agent Assistants Always Fail

Siri launched in 2011. $100B+ invested across Siri, Alexa, and Google Assistant. After 15 years, the most common use case is still 'set a timer.' The failure isn't talent. It's architecture.

Siri launched in 2011. Alexa in 2014. Google Assistant in 2016. The best AI engineers on earth, backed by the richest companies in history, working on the same problem for over a decade.

None of them can reliably set a reminder AND check your calendar AND draft an email in one interaction.

After 15 years, the most common use case is still setting timers and alarms.

This isn't a talent problem. It's an architectural one. And every major tech company is about to repeat the same mistake with LLMs unless they understand why.

The Generalist Ceiling

The assumption behind every single-agent assistant is the same: build one model smart enough to handle everything. Make it better every year. Eventually it'll be good enough.

The reality: general-purpose systems hit a ceiling where breadth kills depth.

Medicine learned this a century ago. The "general practitioner" was how medicine started -- one doctor who handled everything from broken bones to heart disease. Medicine didn't advance by making GPs smarter. It advanced by specializing. In 1931, 84% of doctors considered themselves GPs. By 1965, that had fallen to 37%. Today the ABMS recognizes 40 specialties and 89 subspecialties. Each specialist goes deeper than any generalist ever could. The GP still exists, but their job changed: they became the coordinator, routing patients to the right specialist.

Geoffrey West's research on scaling in cities reveals something similar. Cities produce disproportionate creative and economic output -- but not because city dwellers are individually smarter. The output scales superlinearly with specialization density. More specialists interacting in close proximity produces more than the sum of their parts. The gains come from the interactions between specialists, not from any individual's breadth of knowledge.

The lesson applies directly to AI. A single agent that tries to handle email, calendar, files, music, smart home, navigation, and messaging is the medical equivalent of one doctor practicing every specialty simultaneously. It's not a scaling problem. It's a category error.

Why "Just Make It Smarter" Doesn't Work

Every year the same headline: "This year Siri gets a major upgrade." Every year the same result: marginal improvement.

The architectural problem is fundamental. A single agent maintaining context across dozens of domains means the context window becomes a battlefield. Every domain competes for attention. Email context crowds out calendar context. Navigation instructions interfere with music preferences. Smart home state collides with messaging history.

The agent optimizes for the average case and excels at nothing. It becomes the median of its capabilities rather than the maximum. Adding more parameters, more training data, more compute -- none of this solves the structural problem. You're making a bigger generalist, not a better system.

Contrast this with how the best human organizations work. A CEO doesn't answer support tickets, write marketing copy, debug code, and negotiate contracts in the same afternoon. They have a team. The CEO's job is intent and coordination. The team's job is execution. The quality of the organization doesn't depend on how much the CEO personally knows about each domain. It depends on the quality of the specialists AND the quality of the coordination layer.

Single-agent assistants put the CEO on the support desk.

The Orchestration Alternative

What if instead of one smarter agent, you had a coordinator that dispatched to specialists?

Consider a concrete example: "Prepare for my meeting with Acme tomorrow."

The single-agent approach: The assistant tries to search your calendar, find relevant emails, locate related documents, and summarize your notes -- all in one pass. It finds the meeting. It pulls up two of the seven relevant emails. It misses the shared document entirely because the context window is already saturated with calendar data. It produces a passable but incomplete summary. You spend 15 minutes filling in the gaps yourself.

The orchestrated approach: A coordinator identifies the task and dispatches three specialists. A calendar agent finds the meeting, pulls attendee details, checks the agenda, and flags that the time was moved twice (relevant context). An email agent surfaces the entire thread that led to this meeting, including the attachment Sarah sent last Tuesday with revised pricing. A research agent pulls Acme's recent press releases -- they just closed a funding round, which changes your negotiation position.

Each specialist goes deep in its domain. The coordinator synthesizes results into a single briefing.

The output isn't incrementally better. It's categorically different. The single agent gives you a reminder. The orchestrated system gives you preparation.

Why This Is Happening Now

If orchestration is so obviously better, why hasn't it already won? Because until recently, it wasn't feasible. Three things converged in 2025-2026 that changed the equation.

Models got cheap enough to run multiple agents on one task. In early 2023, running five agents on one task would cost roughly $0.50. Prohibitive for consumer use. By late 2025, the cost of equivalent capability had dropped by 10x annually — and for some benchmarks, as much as 900x. Running five agents now costs less than a single interaction did two years ago. The economic barrier to orchestration evaporated.

Protocols emerged for agent interop. Anthropic open-sourced the Model Context Protocol and Google launched the Agent-to-Agent protocol, giving agents a standard way to discover each other's capabilities and communicate. Before this, every multi-agent system was custom-built, bespoke, fragile. Standard protocols turned agent orchestration from a research project into an engineering problem.

Consumer hardware got powerful enough for local orchestration. Apple's M-series chips can run multiple agent processes locally without round-tripping to the cloud for every interaction. Local orchestration means lower latency, better privacy, and no per-request API costs for the coordination layer. The orchestrator runs on your machine. The specialists can run locally or in the cloud depending on the task.

None of these conditions existed when Siri launched. The single-agent architecture wasn't a bad choice in 2011. It was the only viable choice. But it's a choice now. And it's the wrong one.

The Coordination Layer Is the Product

Apple's revamped Siri -- delayed to Spring 2026 -- is a better single agent. Smarter model. More integrations. Deeper system access. But it's still one agent trying to do everything. It's a better GP, not a hospital.

The next decade of AI assistants won't be defined by which company has the smartest model. It'll be defined by who builds the best orchestration layer -- the coordination infrastructure that lets specialized agents work together on tasks no single agent can handle well.

The parallel to computing history is exact. The value of an operating system was never in any individual program. It was in the ability to run many programs, manage their interactions, and present a unified experience to the user. The OS was the coordination layer for software. An agent OS is the coordination layer for intelligence.

The era of the single genius assistant is ending. The era of coordinated specialist teams is beginning.

The trap is thinking you need a smarter Siri. You don't. You need a system.