A/B Testing Resources Appendix

This GitHub is a companion to our blog post "Recommend Resources for Starting A/B Testing." If there are any resources you think we should add, please file an issue or add a pull request!

Starter Kit Overflow

Making Big Changes
Designing with Data – Book, ideal for designers/PMs or Analysts looking to improve their partnership with designers.
Statistical Methods in Online A/B testing – Book, maps statistical concepts to A/B testing. Good for a thorough deep dive or as a reference book.
Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained - Microsoft.
Communicating A/B Test Results for Conversion Rates with Ratios and Uncertainty Intervals - How and what metrics to show stakeholders. Especially helpful if you're designing an exp platform UI.
It’s All A/Bout Testing: The Netflix Experimentation Platform - Netflix.
How Not to Run an A/B Test - A classic on stats mistakes.
Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology - Call for more statisticians to collaborate with industry.
The top 3 mistakes that make your A/B test results invalid

Advanced Topics

Pitfalls of Long-Term Online Controlled Experiments - Microsoft.
Optional stopping in data collection: p values, Bayes factors, credible intervals, precision - Shows how different stopping criteria balloons error rates and one safe option. Helpful if you’re Bayesian curious, especially if you hope credible intervals are safe stopping criteria.
From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks - LinkedIn.
Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned - Use past experiment results to pick metrics with good properties.
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data - Run faster experiments. Industry standard for big tech.
Innovating Faster on Personalization Algorithms at Netflix Using Interleaving
Early Detection of Long Term Evaluation Criteria in Online Controlled Experiments
Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix
Peeking at A/B Tests - Understand Optimizely statistical approach when paper was published (controlling the false discovery rate).
A/B Testing with Fat Tails - Meta analysis of Bing’s experiments found fat tailed impacts (ie huge impact from “outlier” positives).
Integrating Mediators and Moderators in Research Design - Crucial concepts in experiment design and analysis.
Why Most Published Research Findings Are False Classic paper early in the replication crisis.
P-Curve: A Key to the File-Drawer - Useful tool to analyze evidence quality for a set of experiments.
The influence of hidden researcher decisions in applied microeconomics - Important concept for experiment generalizability.
Switchback Tests and Randomized Experimentation Under Network Effects at DoorDash
How Airbnb Measures Future Value to Standardize Tradeoffs - Airbnb’s system to predict long term value of actions without experiments.
Estimating Network Effects Using Naturally Occurring Peer Notification Queue Counterfactuals
Experimentation in a Ridesharing Marketplace - Some domains require a different randomization unit than users. Example in ridesharing.
Understanding noninferiority trials
Recursive partitioning for heterogeneous causal effects | PNAS - An ML approach to find segments with heterogeneous treatment effects.
Limiting bias from test-control interference in online marketplace experiments
An Empirical Meta-analysis of E-commerce A/B Testing Strategies - See which types of ecommerce treatments performed well in a meta-analysis. YMMV.
https://arxiv.org/pdf/2107.08995.pdf - Compares observational studies where treatments are opt in vs randomized A/B tests. Opt in "experiments" produce unreliable results.

Just For Fun

Still Not Significant - List of creative phrases used to describe non significant results over the years.
Estimated Costs of Pivotal Trials for Novel Therapeutic Agents Approved by the US Food and Drug Administration - Cost estimate for pivotal clinical trials. Good to share when people complain about costs/inconvenience of A/B tests.

Web Apps Tools

Sample Size Calculator - Simple, frequentist approach for conversation rates.
Bookings.com power calculator - Plan sample sizes with more advanced features.
Chi Squared Test - Simple, frequentist analysis for conversion rate differences.
So You Think You Can Test - Simulation game to test and develop your experiment analysis skills.
Test and Roll - Sample size planning with the Test and Roll framework (advanced).
GoodUI - Repository of winning UIs from A/B tests. YMMV.

Software Packages

Test and Roll - R package to set sample sizes based on test and roll framework.
Multtest - R package for resampling-based multiple hypothesis testing.
PyMC - Python package for bayesian analysis.
PlanOut - FB’s experimentation planning.
PlanAlyzer - Linter for experimental designs implemented in PlanOut.
Ax - Python package to implement adaptive experiments from FB.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A/B Testing Resources Appendix

Starter Kit Overflow

Advanced Topics

Just For Fun

Web Apps Tools

Software Packages

About

Releases

Packages

candicehchow/ab-testing-resources

Folders and files

Latest commit

History

Repository files navigation

A/B Testing Resources Appendix

Starter Kit Overflow

Advanced Topics

Just For Fun

Web Apps Tools

Software Packages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages