This GitHub is a companion to our blog post "Recommend Resources for Starting A/B Testing." If there are any resources you think we should add, please file an issue or add a pull request!
- Making Big Changes
- Designing with Data – Book, ideal for designers/PMs or Analysts looking to improve their partnership with designers.
- Statistical Methods in Online A/B testing – Book, maps statistical concepts to A/B testing. Good for a thorough deep dive or as a reference book.
- Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained - Microsoft.
- Communicating A/B Test Results for Conversion Rates with Ratios and Uncertainty Intervals - How and what metrics to show stakeholders. Especially helpful if you're designing an exp platform UI.
- It’s All A/Bout Testing: The Netflix Experimentation Platform - Netflix.
- How Not to Run an A/B Test - A classic on stats mistakes.
- Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology - Call for more statisticians to collaborate with industry.
- The top 3 mistakes that make your A/B test results invalid
- Pitfalls of Long-Term Online Controlled Experiments - Microsoft.
- Optional stopping in data collection: p values, Bayes factors, credible intervals, precision - Shows how different stopping criteria balloons error rates and one safe option. Helpful if you’re Bayesian curious, especially if you hope credible intervals are safe stopping criteria.
- From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks - LinkedIn.
- Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned - Use past experiment results to pick metrics with good properties.
- Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data - Run faster experiments. Industry standard for big tech.
- Innovating Faster on Personalization Algorithms at Netflix Using Interleaving
- Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments
- Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix
- Peeking at A/B Tests - Understand Optimizely statistical approach when paper was published (controlling the false discovery rate).
- A/B Testing with Fat Tails - Meta analysis of Bing’s experiments found fat tailed impacts (ie huge impact from “outlier” positives).
- Integrating Mediators and Moderators in Research Design - Crucial concepts in experiment design and analysis.
- Why Most Published Research Findings Are False Classic paper early in the replication crisis.
- P-Curve: A Key to the File-Drawer - Useful tool to analyze evidence quality for a set of experiments.
- The influence of hidden researcher decisions in applied microeconomics - Important concept for experiment generalizability.
- Switchback Tests and Randomized Experimentation Under Network Effects at DoorDash
- How Airbnb Measures Future Value to Standardize Tradeoffs - Airbnb’s system to predict long term value of actions without experiments.
- Estimating Network Effects Using Naturally Occurring Peer Notification Queue Counterfactuals
- Experimentation in a Ridesharing Marketplace - Some domains require a different randomization unit than users. Example in ridesharing.
- Understanding noninferiority trials
- Recursive partitioning for heterogeneous causal effects | PNAS - An ML approach to find segments with heterogeneous treatment effects.
- Limiting bias from test-control interference in online marketplace experiments
- An Empirical Meta-analysis of E-commerce A/B Testing Strategies - See which types of ecommerce treatments performed well in a meta-analysis. YMMV.
- https://arxiv.org/pdf/2107.08995.pdf - Compares observational studies where treatments are opt in vs randomized A/B tests. Opt in "experiments" produce unreliable results.
- Still Not Significant - List of creative phrases used to describe non significant results over the years.
- Estimated Costs of Pivotal Trials for Novel Therapeutic Agents Approved by the US Food and Drug Administration - Cost estimate for pivotal clinical trials. Good to share when people complain about costs/inconvenience of A/B tests.
- Sample Size Calculator - Simple, frequentist approach for conversation rates.
- Bookings.com power calculator - Plan sample sizes with more advanced features.
- Chi Squared Test - Simple, frequentist analysis for conversion rate differences.
- So You Think You Can Test - Simulation game to test and develop your experiment analysis skills.
- Test and Roll - Sample size planning with the Test and Roll framework (advanced).
- GoodUI - Repository of winning UIs from A/B tests. YMMV.
- Test and Roll - R package to set sample sizes based on test and roll framework.
- Multtest - R package for resampling-based multiple hypothesis testing.
- PyMC - Python package for bayesian analysis.
- PlanOut - FB’s experimentation planning.
- PlanAlyzer - Linter for experimental designs implemented in PlanOut.
- Ax - Python package to implement adaptive experiments from FB.