Statistical tests commonly used for AB testing, like the two-sample z-test, rely on the assumption that the experimental observations (i.e. samples) are independent. If this assumption is not met, the test becomes unreliable in the sense that it may not...
Read More
In Why Most Published Research Findings Are False John Ioannidis argues that if most hypotheses we test are false, we end up with more false research findings than true findings, even if we do rigorous hypothesis testing. The argument hinges...
Read More
In previous posts, I have discussed methods for AB testing when we are interested only in the success rate of a design. It is often the case, however, that success rates don’t tell the whole story. In the case of...
Read More
This post gives an introduction to working with the new February release of the Wikipedia Clickstream dataset. The data shows how people get to a Wikipedia article and what articles they click on next. In other words, it gives a...
Read More
Although classical hypothesis tests are probably still the most widespread technique for online A/B tests, they are not the most appropriate. In this post, I will describe the shortcomings of hypothesis testing for web optimization and propose a method of...
Read More