A/B Testing

I was cleaning up my laptop when I stumbled on a presentation from 2004 called Front Line Internet Analytics at Amazon.com. Not exactly sure where I found it originally, but I suspect it was probbably from Greg Linden’s blog. There are 31 slides in all, but I would direct your attention to slides 11, 12 and 13. They discuss Amazon’s A/B Testing strategy, the problems that they had to work around and their findings.

For those of you who dislike PDFs as much as I do, here are the slide contents.

A/B Tests

A/B test basics
- Control/treatment(s) test for limited time
- Randomly show one or more treatments to visitors
- Measure parameters, such units and revenue by category (and total), session time, session length, etc
A/B tests insulate test from external factors
Examples:
- New home page design
- Moving features around the page
- Different algorithms for recommendations
- Changing search relevance rankings
Usually launch feature if desired metrics are (statistically) significantly better

Challenges to A/B Tests

Multiple A/B tests are running every day
Challenges
- Conflicting A/B tests (two experiments touch same feature). Scheduling, QAâ€™ing
- Long term effects. Some features are â€œcoolâ€ for the first two weeks and may die down
- Primacy effect. Changing navigation may degrade performance temporarily
- Consistency. Same person may get different treatment when they work from home/work
- Statistical tests. Distributions are far from normal. They have large mass at zero (e.g. no purchase)

Data Trumps Intuitions

Amazon constantly tests ideas.
Even with experienced employees, many ideas fail to show significant improvement
Many times a prototype is easier to build than a model that predicts user response