A/B Testing
I was cleaning up my laptop when I stumbled on a presentation from 2004 called Front Line Internet Analytics at Amazon.com. Not exactly sure where I found it originally, but I suspect it was probbably from Greg Linden’s blog. There are 31 slides in all, but I would direct your attention to slides 11, 12 and 13. They discuss Amazon’s A/B Testing strategy, the problems that they had to work around and their findings.
For those of you who dislike PDFs as much as I do, here are the slide contents.
A/B Tests
- A/B test basics
- Control/treatment(s) test for limited time
- Randomly show one or more treatments to visitors
- Measure parameters, such units and revenue by category (and total), session time, session length, etc
- A/B tests insulate test from external factors
- Examples:
- New home page design
- Moving features around the page
- Different algorithms for recommendations
- Changing search relevance rankings
- Usually launch feature if desired metrics are (statistically) significantly better
Challenges to A/B Tests
- Multiple A/B tests are running every day
- Challenges
- Conflicting A/B tests (two experiments touch same feature). Scheduling, QA’ing
- Long term effects. Some features are “cool†for the first two weeks and may die down
- Primacy effect. Changing navigation may degrade performance temporarily
- Consistency. Same person may get different treatment when they work from home/work
- Statistical tests. Distributions are far from normal. They have large mass at zero (e.g. no purchase)
Data Trumps Intuitions
- Amazon constantly tests ideas.
- Even with experienced employees, many ideas fail to show significant improvement
- Many times a prototype is easier to build than a model that predicts user response