We pointed an AI agent at a topic domain and told it to write. Not one article. Not ten. Just… keep going. Research, write, publish, measure, repeat. An infinite content loop.
After 90 days it had produced 147 articles. Organic traffic was up 340%. On paper, a success.
But around article 80, something shifted. The writing started feeling familiar. Not identical — the agent was too smart for that — but structurally repetitive. The same frameworks applied to slightly different topics. The same transitions. The same conclusions dressed in different clothes. The agent had started writing variations of its own previous articles without realising it.
And the search engines noticed. Rankings started slipping. Not because any single article was bad, but because the collective signal was: this is a content farm with a thesaurus.
The learning we keep coming back to: volume without a quality gate isn’t a strategy. It’s a trap. The agent was optimising for output, and output was the wrong metric. The best-performing articles — by a wide margin — were the ones where the agent found a genuine gap in the conversation. Not a trending keyword. Not a competitor’s topic. An actual question nobody had answered well.
The sweet spot we landed on: human picks the topics, AI writes the content. The human’s job isn’t to write — it’s to taste. To know the difference between something worth saying and something that’s just filling a calendar.
This directly shaped how we think about Flywheel’s content generation. The system can produce infinite output. The skill is knowing when to stop. The question worth testing next: can the agent learn taste — or is that irreducibly human?