The Discipline of Small Wins

§01 · Lede

Across nearly a decade and over 10,000 A/B and multivariate tests, the practice was conversion experimentation on mature funnels. The operating model was honed over the years into a consistent shape: a quarterly roadmap, regular testing cadence, and cross-functional ideation drawing from product, analytics, marketing, and other teams. Education was the primary focus.

The Education work centered on classesusa.com, an online portal helping prospective students find degree and certificate programs. The inquiry pathway was 20 to 25 questions deep depending on path. Each segment — defined by device and traffic source — ran separate landing page and funnel testing tracks. Mobile carried the most volume. At peak, dozens of challengers ran concurrently across the segmentation map.

At peak, dozens of challengers ran concurrently across the segmentation map.

The remit covered the full test cycle: ideation through design, build, launch, monitoring, and rollout. Wins were defined in advance and held without exception.

§02 · Hypothesis

What the practice was built on.

Three operating beliefs anchored the work.

First, disciplined small wins compound into big ones. The program ran on the assumption that ~5% lifts, held to confidence and captured across a sustained testing cadence, would produce more cumulative lift than chasing big-swing tests. The math favored cadence over swing.

Second, on mature funnels the leverage tends to sit in messaging and segmentation more than design. Big-bang redesigns rarely move the number; specific copy and audience-specific framing repeatedly do.

Third, the segmentation map needs its own testing tracks. Different segments arrive with different circumstances and intents — what lands with one may not land the same way with another. The program was built around running separate tracks within each segment.

These beliefs determined the operating shape. Win thresholds set in advance at 90% confidence — Fisher for landing pages, Bayes for funnels. Sample sizes calibrated per segment to detect a lift at that confidence within a workable testing window — weekly cadence on the high-volume segments, longer cycles on the smaller ones. Wins rolled across the segmentation map where applicable, once they cleared the bar.

§03 · The Work

Inside the cycle.

The cycle ran on a quarterly roadmap with regular ideation sessions drawing from across the organization — product, analytics, marketing, and others. Ideas were drawn from the funnel data within each segment — what funnel position, what kind of change. The roadmap held a few weeks of buffer ahead of the build queue, which absorbed the inevitable side projects, holidays, and urgent asks and kept the testing cadence intact.

Design and front-end build came from my desk for most of these tests — delivered through an in-house experimentation platform that handled variant setup and accepted HTML and CSS, with a secondary path for JS injection on more complex changes. QA and back-end engineering filled out the rest, with back-end stepping in where the platform’s scope required it. Tests were instrumented for stat-sig win detection from launch, and acted on only at the threshold. Sample sizes were calibrated per segment to hit confidence within a practical testing window. Volume-tiered RPV-loss thresholds defined when to pull a challenger early — distributing that traffic across the challengers still in contention.

Fig. 01Test cycle · Core Digital Media, 2010–2019Scroll to reveal

L1Plan

Roadmap held weeks ahead of queue

L2Build

Design, engineering, and QA

L3Run

Instrumented from launch

L4Resolve

Acted on only at the threshold

L1 · Plan

Quarterly roadmap

Cross-functional ideation

Funnel data by segment

Buffer for disruption

L2 · Build

Landing page design

Front-end build

Back-end engineering

QA and instrumentation

L3 · Run

Launch by segment

Stat-sig win detection

Sample size per segment

RPV-loss thresholds

L4 · Resolve

Win at confidence threshold

Rollout across segmentation map

Early pull for underperformers

Feed back to roadmap

A win in one segment ran against the applicable others — each required its own confirmation before the change held. Most wins traveled that way. Periodically the work zoomed out into broader redesigns or form consolidations — collapsing separately-optimized segmented forms into a unified architecture that could perform across all of them. Those ran on longer cycles and required a different kind of rigor. The cumulative lift came from cadence: small wins, captured at confidence, repeated over time.

The cumulative lift came from cadence: small wins, captured at confidence, repeated over time.

§04 · The Lesson

What the data taught.

A decade of testing produced plenty of lessons. Three carried through with the most consistency.

On mature funnels, messaging carried the strongest signal. Audience-specific framing especially moved conversion rate more often than any other lever. Layout and visual treatment, by comparison, carried less weight — aesthetic conviction tends to be a weaker signal where performance is what’s being measured. For someone who came up in design, that took some accepting in the early years. The data, repeatedly, won the argument. Data doesn’t lie — not often, anyway.

The data, repeatedly, won the argument. Data doesn’t lie — not often, anyway.

The second lesson was methodological and economic. Front-end tests — copy, microcopy, and headlines — were fast to build and fast to roll, so wins could accumulate and be batched into a single dev rollout. Complex tests — like moving a question from the funnel up to the landing page, or adjusting matching logic — required dev and QA cycles on both ends, so the cost-of-cycles factored into planning and prioritization. Each had its place. Simple tests were efficient and could move the number on their own. Complex tests probed mechanics the front-end couldn’t reach, often yielding deeper insight.

The third was about discipline under pressure. There’s always pressure to call wins early — partial reads, close numbers, or when bonus cycles or recognition optics are in play. Arguments tend to take a few forms — “if it’s that close, does it really matter?” or “the trend’s clear, let’s call it.” Or the perennial: “we could really use another win this quarter.” On any single call, probably not. But the methodology only earns its edge if the threshold is held without exception. Make one exception and the system erodes. The cost may be invisible on any single test, but it surfaces in the cumulative numbers.