Sequential Testing: Stop Form A/B Tests Early, Safely
Sequential testing lets you end a form A/B test as soon as there's a real winner — without the peeking that inflates false positives. A researcher's guide.

Researchers face a constant tension when testing form variants: you want to act on a result the moment it's real, but stopping a normal A/B test early — the instant it looks significant — is one of the most reliable ways to fool yourself. Sequential testing resolves that tension by deciding the stopping rule before you collect a single response.
Why is it risky to stop an A/B test as soon as it looks significant?
Because checking for significance over and over and stopping at the first "win" inflates false positives. Armitage, McPherson and Rowe (1969) showed that repeating a significance test on accumulating data pushes the chance of a false positive well above the nominal 5%. The more often you peek, the more likely random noise crosses your threshold at some point — so the "winner" you stop on is frequently a fluke.
What is sequential testing?
A test designed up front to be monitored as data arrives, with a stopping boundary built into the design. Because the rule already accounts for the repeated looks, crossing the boundary is a valid reason to stop — unlike peeking at a fixed-horizon test. You trade a fixed end date for a fixed rule, and in return you can often end far sooner.
How does Evan Miller's simple sequential test work?
Pick a sample size N from a power calculation, split traffic 50/50, count the successes in each arm, and watch the lead. In Evan Miller's Simple Sequential A/B Testing, you track T − C (treatment successes minus control successes) and apply two rules: stop and declare the treatment a winner the moment T − C reaches 2√N; if T + C reaches N first, stop and call it no winner. For a two-sided test, stop when |T − C| reaches 2.25√N.

Notably, the method rests on gambler's ruin (a random walk toward one of two boundaries), not the SPRT. It assumes nothing about the distribution of possible effects and ignores the number of failures in each group — which is what makes it simple enough to run with basic arithmetic.
How much faster is it — and when does it fall short?
At low conversion rates it can cut the observations you need by half or more, because you stop the instant the lead is decisive. The trade-offs Miller notes are real: the advantage shrinks above roughly 10% conversion, a test with no true effect can run longer than a fixed-size test, and the savings fade when the real effect is large (you'd have spotted it quickly anyway). The one rule you cannot break: don't peek or extend beyond the N and boundary you committed to.
How to run a sequential form test in RoundPushPin
Define a clear success — a completed form or a qualified lead — split visitors across two form branches, and track each arm's successes in your structured data. Because every response lands as a typed row, counting T and C and computing the 2√N boundary is a simple query, and you stop the moment the lead crosses it. Pair this with how to A/B test forms for the setup and form completion rate for choosing the metric — and use the built-in heatmaps and drop-off analytics to understand why a variant wins, not just that it did.
This guide adapts the method from Evan Miller's Simple Sequential A/B Testing (2015). See the original for the full derivation and the power-calculation formula for choosing N.
Frequently asked questions
- Can I stop an A/B test early when it looks significant?
- Not with an ordinary fixed-horizon test — stopping at the first significant result inflates false positives. A sequential test is designed for it: the stopping boundary already accounts for checking the data as it arrives, so an early stop stays valid.
- What is the peeking problem?
- Peeking is repeatedly checking an A/B test for significance and stopping at the first 'win'. Armitage and colleagues (1969) showed this pushes the false-positive rate well above the nominal level, so apparent winners are often flukes.
- When is sequential testing not worth it?
- Its savings are largest at low conversion rates. Above roughly 10% conversion the advantage shrinks, a truly null test can take longer than a fixed-size test, and you must not peek or extend beyond the rule you set.
Sources
- Evan Miller — Simple Sequential A/B Testing (2015) — Evan Miller
- Armitage, P., McPherson, C. K. & Rowe, B. C. (1969) — Repeated Significance Tests on Accumulating Data — Journal of the Royal Statistical Society, Series A
Keep reading
How to A/B Test Forms (and Read the Results)
A/B testing a form means showing two versions to comparable groups to see which converts better. Learn what to test, how to read results, and how long to run.
How to Build Trust in Your Forms (So People Complete Them)
People won't hand data to a form they don't trust. A research-backed guide to how visitors judge credibility and the trust signals that matter on forms.