← Back to Field Notes

Agents don't degrade gracefully

pattern 19 Oct 2025
agent-limitsreliabilitywayfinder

Agents don’t degrade gracefully

This is maybe the most important thing we’ve learned building agent systems, and it came from the Decision Tree Collapse experiment.

We gave an agent a rostering problem — the kind that powers Wayfinder — and gradually turned up the complexity. Five carers: perfect output. Ten: still solid. Twenty: accurate. Thirty: slight wobble, but usable.

Then somewhere between 30 and 50, the agent crossed a line. Not a gradual decline. A cliff. The schedules still looked valid. The formatting was clean. The confidence scores were high. But buried inside the output were double-bookings, impossible travel times, and constraint violations that would only surface when a real human tried to execute the roster.

Beyond 50, the agent confidently produced schedules that broke multiple hard rules while reporting that everything was fine.

This is the pattern we keep seeing across every domain: agents don’t get gradually worse. They work, they work, they work — and then they cross a threshold and start producing plausible-looking nonsense. With confidence.

The most dangerous output isn’t the obviously wrong answer. It’s the 95% correct one. The one that looks right enough that nobody checks. The 5% that’s wrong is invisible unless you build a verification layer specifically designed to catch it.

Three things we now do on every agent build because of this:

  1. Separate the thinker from the checker. The agent that generates the output is never the agent that validates it.
  2. Break big problems into small ones. Geographic clustering in Wayfinder kept the agent accurate at 3–4x the scale. The agent doesn’t need to be smarter — it needs a smaller problem.
  3. Build the escalation into the architecture, not the prompt. The system knows its own limits structurally, not because we asked it nicely to be humble.

This shaped Wayfinder’s entire architecture. It shapes everything we build now. The question we’re still working on: can an agent learn to detect its own approaching cliff — not after it’s crossed it, but before?

← Previous Your personas are lying to you Next → What if the support bot just... told the truth?
Start a conversation
Curiosity is the starting point.

If something here sparked a question, we'd like to hear it.

Start a conversation