The Hypothesis

Can an AI agent design, deploy, and iterate on email variants autonomously — without human oversight?

The Concept

Traditional A/B testing is slow and manual. Pick two subject lines, split the list, wait three days, pick a winner. This experiment lets an AI agent design, deploy, and iterate on email variants autonomously — testing subject lines, body copy, send times, and CTAs simultaneously across hundreds of micro-segments.

Email A/B testing

The hypothesis

Can an AI agent design, deploy, and iterate on email variants autonomously — without human oversight?

The concept

How it works

Campaign brief
Generate variants — 12–20 variants: subject, body, CTA, send time
Micro-segment allocation — 50–200 person test groups
Deploy and wait — 4–8 hour observation window
Score and prune — kill losers, promote winners
Roll out best to full list — full list send

The agent completes multiple test cycles in the time it takes a human to set up one A/B test.

What it explores

Can AI-generated email variants outperform human-written ones at scale?
What’s the minimum viable segment size for statistically valid micro-tests?
How many simultaneous variants can you test before results become noise?
Does the agent learn cross-campaign patterns that improve over time?
What guardrails prevent the agent from sending tone-deaf or off-brand emails?

What we found

Agent found winning variants 3–4x faster than manual A/B testing
Open rates improved by 18% on average across test campaigns
Send-time personalisation mattered more than subject line variation for most segments

Learnings

Micro-segmentation with 50+ recipients per group produces reliable signal — below that, noise dominates.
Subject line variations plateau quickly — structural changes to email body are where real gains live.
Without brand voice guardrails, the agent optimises toward clickbait — constraints are a feature, not a limitation.
Cross-campaign memory is the real compound advantage — learnings from campaign N inform campaign N+1.

Where it goes next

This experiment directly fed into Flywheel’s email campaign agent. The micro-segmentation and autonomous test loop architecture is now a core feature of the product.