AB Testing and the Importance of Independent Observations

Statistical tests commonly used for AB testing, like the two-sample z-test, rely on the assumption that the experimental observations (i.e. samples) are independent. If this assumption is not met, the test becomes unreliable in the sense that it may not...

What if AB Testing is like Science ... and most results are false?

In Why Most Published Research Findings Are False John Ioannidis argues that if most hypotheses we test are false, we end up with more false research findings than true findings, even if we do rigorous hypothesis testing. The argument hinges...

Beautifully Bayesian - How to do AB Testing with Discrete Rewards?

In previous posts, I have discussed methods for AB testing when we are interested only in the success rate of a design. It is often the case, however, that success rates don’t tell the whole story. In the case of...

Wikipedia Clickstream - Getting Started

This post gives an introduction to working with the new February release of the Wikipedia Clickstream dataset. The data shows how people get to a Wikipedia article and what articles they click on next. In other words, it gives a...

How Naive AB Testing Goes Wrong and How to Fix It

Although classical hypothesis tests are probably still the most widespread technique for online A/B tests, they are not the most appropriate. In this post, I will describe the shortcomings of hypothesis testing for web optimization and propose a method of...