Format matters less than you think

We spent weeks testing whether the format you deliver instructions in — JSON, XML, or Markdown — changes how well AI agents execute. Same content, three containers, fresh sessions, multiple models.

The short answer: it doesn’t matter nearly as much as the internet thinks it does.

JSON was best for structured tool schemas. XML was best for deeply nested conditional logic. But Markdown — plain, boring Markdown — was the most consistent performer across every combination of model, task type, and output format. Never the winner on any single axis. Always reliable.

The bigger finding: the variable that actually moved the needle wasn’t format. It was task type. How you structure the content of instructions matters enormously. How you package that content matters surprisingly little.

We see a lot of engineering time spent on format optimisation — JSON schemas, XML wrappers, elaborate structured prompts. Most of it is wasted effort. Write clear instructions in Markdown and move on to the problems that actually matter.

This is why both AI Toolkit and Orchesta default to Markdown for skill definitions. Not because it’s technically superior. Because it’s the format that works well enough everywhere and lets you focus on what you’re actually saying.

The open thread: as models improve, does format matter even less? We’re planning to re-run this experiment every six months to find out.