Santa Monica, CAGENE DAVIS
← Selected Work CHAPTER 01 / 05
Experimentation · CRO

The Discipline of Small Wins

Core Digital Media, 2010–2019
A/B and multivariate testing
Conversion rate optimization
Segmentation strategy
Cross-functional ideation
Statistical analysis
10,000+
A/B and multivariate tests
20+
Active testing segments
§01 · Lede

Across nearly a decade and over 10,000 A/B and multivariate tests, the practice was conversion experimentation on mature funnels. The operating model was honed over the years into a consistent shape: a quarterly roadmap, regular testing cadence, and cross-functional ideation drawing from product, analytics, marketing, and other teams. Education was the primary focus.

The Education work centered on classesusa.com, an online portal helping prospective students find degree and certificate programs. The inquiry pathway was 20 to 25 questions deep depending on path. Each segment — defined by device and traffic source — ran separate landing page and funnel testing tracks. Mobile carried the most volume. At peak, dozens of challengers ran concurrently across the segmentation map.

At peak, dozens of challengers ran concurrently across the segmentation map.

The remit covered the full test cycle: ideation through design, build, launch, monitoring, and rollout. Wins were defined in advance and held without exception.

§02 · Hypothesis

What the practice was built on.

Three operating beliefs anchored the work.

First, disciplined small wins compound into big ones. The program ran on the assumption that ~5% lifts, held to confidence and captured across a sustained testing cadence, would produce more cumulative lift than chasing big-swing tests. The math favored cadence over swing.

Second, on mature funnels the leverage tends to sit in messaging and segmentation more than design. Big-bang redesigns rarely move the number; specific copy and audience-specific framing repeatedly do.

Third, the segmentation map needs its own testing tracks. Different segments arrive with different circumstances and intents — what lands with one may not land the same way with another. The program was built around running separate tracks within each segment.

These beliefs determined the operating shape. Win thresholds set in advance at 90% confidence — Fisher for landing pages, Bayes for funnels. Sample sizes calibrated per segment to detect a lift at that confidence within a workable testing window — weekly cadence on the high-volume segments, longer cycles on the smaller ones. Wins rolled across the segmentation map where applicable, once they cleared the bar.

§03 · The Work

Inside the cycle.

The cycle ran on a quarterly roadmap with regular ideation sessions drawing from across the organization — product, analytics, marketing, and others. Ideas were drawn from the funnel data within each segment — what funnel position, what kind of change. The roadmap held a few weeks of buffer ahead of the build queue, which absorbed the inevitable side projects, holidays, and urgent asks and kept the testing cadence intact.

Design and front-end build came from my desk for most of these tests — delivered through an in-house experimentation platform that handled variant setup and accepted HTML and CSS, with a secondary path for JS injection on more complex changes. QA and back-end engineering filled out the rest, with back-end stepping in where the platform’s scope required it. Tests were instrumented for stat-sig win detection from launch, and acted on only at the threshold. Sample sizes were calibrated per segment to hit confidence within a practical testing window. Volume-tiered RPV-loss thresholds defined when to pull a challenger early — distributing that traffic across the challengers still in contention.

Fig. 01Test cycle · Core Digital Media, 2010–2019Scroll to reveal
L1Plan
Roadmap held weeks ahead of queue
L2Build
Design, engineering, and QA
L3Run
Instrumented from launch
L4Resolve
Acted on only at the threshold
L1 · Plan
Quarterly roadmap
Cross-functional ideation
Funnel data by segment
Buffer for disruption
L2 · Build
Landing page design
Front-end build
Back-end engineering
QA and instrumentation
L3 · Run
Launch by segment
Stat-sig win detection
Sample size per segment
RPV-loss thresholds
L4 · Resolve
Win at confidence threshold
Rollout across segmentation map
Early pull for underperformers
Feed back to roadmap

A win in one segment ran against the applicable others — each required its own confirmation before the change held. Most wins traveled that way. Periodically the work zoomed out into broader redesigns or form consolidations — collapsing separately-optimized segmented forms into a unified architecture that could perform across all of them. Those ran on longer cycles and required a different kind of rigor. The cumulative lift came from cadence: small wins, captured at confidence, repeated over time.

The cumulative lift came from cadence: small wins, captured at confidence, repeated over time.

§04 · The Lesson

What the data taught.

A decade of testing produced plenty of lessons. Three carried through with the most consistency.

On mature funnels, messaging carried the strongest signal. Audience-specific framing especially moved conversion rate more often than any other lever. Layout and visual treatment, by comparison, carried less weight — aesthetic conviction tends to be a weaker signal where performance is what’s being measured. For someone who came up in design, that took some accepting in the early years. The data, repeatedly, won the argument. Data doesn’t lie — not often, anyway.

The data, repeatedly, won the argument. Data doesn’t lie — not often, anyway.

The second lesson was methodological and economic. Front-end tests — copy, microcopy, and headlines — were fast to build and fast to roll, so wins could accumulate and be batched into a single dev rollout. Complex tests — like moving a question from the funnel up to the landing page, or adjusting matching logic — required dev and QA cycles on both ends, so the cost-of-cycles factored into planning and prioritization. Each had its place. Simple tests were efficient and could move the number on their own. Complex tests probed mechanics the front-end couldn’t reach, often yielding deeper insight.

The third was about discipline under pressure. There’s always pressure to call wins early — partial reads, close numbers, or when bonus cycles or recognition optics are in play. Arguments tend to take a few forms — “if it’s that close, does it really matter?” or “the trend’s clear, let’s call it.” Or the perennial: “we could really use another win this quarter.” On any single call, probably not. But the methodology only earns its edge if the threshold is held without exception. Make one exception and the system erodes. The cost may be invisible on any single test, but it surfaces in the cumulative numbers.

§ 05 Selected Artifacts 5 entries · Core Digital Media, 2010–2019
F-01
Sample size calculator
Testing methodology
Per-segment calibration
2015
F-02
RPV wins log
Test record
Win threshold documentation
2015
F-03
Form fragmentation matrix
UX artifact
One-at-a-time question architecture
2014
F-04
Turndown threshold doc
Process artifact
Early-exit decision criteria
2014
F-05
Mobile LP samples
Design artifact
Control and challenger variants
2014