I was cleaning up my laptop when I stumbled on a presentation from 2004 called Front Line Internet Analytics at Amazon.com. Not exactly sure where I found it originally, but I suspect it was probbably from Greg Linden’s blog. There are 31 slides in all, but I would direct your attention to slides 11, 12 and 13. They discuss Amazon’s A/B Testing strategy, the problems that they had to work around and their findings.

For those of you who dislike PDFs as much as I do, here are the slide contents.

A/B Tests

  • A/B test basics
    • Control/treatment(s) test for limited time
    • Randomly show one or more treatments to visitors
    • Measure parameters, such units and revenue by category (and total), session time, session length, etc
  • A/B tests insulate test from external factors
  • Examples:
    • New home page design
    • Moving features around the page
    • Different algorithms for recommendations
    • Changing search relevance rankings
  • Usually launch feature if desired metrics are (statistically) significantly better

Challenges to A/B Tests

  • Multiple A/B tests are running every day
  • Challenges
    • Conflicting A/B tests (two experiments touch same feature). Scheduling, QA’ing
    • Long term effects. Some features are “cool” for the first two weeks and may die down
    • Primacy effect. Changing navigation may degrade performance temporarily
    • Consistency. Same person may get different treatment when they work from home/work
    • Statistical tests. Distributions are far from normal. They have large mass at zero (e.g. no purchase)

Data Trumps Intuitions

  • Amazon constantly tests ideas.
  • Even with experienced employees, many ideas fail to show significant improvement
  • Many times a prototype is easier to build than a model that predicts user response