ABSTRACT
During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such metrics approximately follow normal distributions, opening up potential opportunities to model them directly without extra model assumptions and solve big data problems via closed-form formulas using distributed algorithms at a fraction of the cost of simulation-based procedures like bootstrap. However, certain attributes of the metrics, such as their corresponding data generating processes and aggregation levels, pose numerous challenges for constructing trustworthy estimation and inference procedures. Motivated by four real-life examples in metric development and analytics for large-scale A/B testing, we provide a practical guide to applying the Delta method, one of the most important tools from the classic statistics literature, to address the aforementioned challenges. We emphasize the central role of the Delta method in metric analytics by highlighting both its classic and novel applications.
Supplemental Material
- Susan Athey and Guido W Imbens . 2017. The econometrics of randomized experiments. Handbook of Economic Field Experiments Vol. 1 (2017), 73--140.Google ScholarCross Ref
- Lars Backstrom and Jon Kleinberg . 2011. Network bucket testing. In Proceedings of the 20th international conference on World wide web. ACM, 615--624. Google ScholarDigital Library
- Douglas Bates, Martin M"achler, Ben Bolker, and Steve Walker . 2014 a. Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014).Google Scholar
- Douglas Bates, Martin Maechler, Ben Bolker, Steven Walker, et almbox. . 2014 b. lme4: Linear mixed-effects models using Eigen and S4. R package version Vol. 1, 7 (2014), 1--23.Google Scholar
- Dennis D Boos and Jacqueline M Hughes-Oliver . 2000. How large does n have to be for Z and t intervals The American Statistician Vol. 54, 2 (2000), 121--128.Google Scholar
- Léon Bottou . 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.Google ScholarCross Ref
- Morton B Brown and Robert A Wolfe . 1983. Estimation of the variance of percentile estimates. Computational Statistics & Data Analysis Vol. 1 (1983), 167--174. Google ScholarDigital Library
- Roman Budylin, Alexey Drutsa, Ilya Katsev, and Valeriya Tsoy . 2018. Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 55--63. Google ScholarDigital Library
- Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell . 2016. Stan: A probabilistic programming language. Journal of Statistical Software Vol. 20 (2016), 1--37.Google Scholar
- George Casella and Roger L Berger . 2002. Statistical Inference, Second Edition. Duxbury Press: Pacific Grove, CA.Google Scholar
- Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou . 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment Vol. 1 (2008), 1265--1276. Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik . 1995. Support-vector networks. Machine Learning Vol. 20 (1995), 273--297. Google ScholarDigital Library
- M. Davidian, A.A. Tsiatis, and S. Leon . 2005. Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. Statist. Sci. Vol. 20 (2005), 295--301. Issue 3.Google ScholarCross Ref
- A. Deng, J. Lu, and J. Litz . 2017. Trustworthy analysis of online A/B tests: Pitfalls, challenges and solutions Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 641--649. Google ScholarDigital Library
- A. Deng and X. Shi . 2016. Data-driven metric development for online controlled experiments: Seven lessons learned. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker . 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the 6th ACM WSDM Conference. 123--132. Google ScholarDigital Library
- Pavel Dmitriev, Somit Gupta, Dong Woo Kim, and Garnet Vaz . 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '17). ACM, New York, NY, USA, 1427--1436. Google ScholarDigital Library
- Pavel Dmitriev and Xian Wu . 2016. Measuring Metrics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 429--437. Google ScholarDigital Library
- Allan Donner . 1987. Statistical methodology for paired cluster designs. American Journal of Epidemiology Vol. 126, 5 (1987), 972--979.Google ScholarCross Ref
- Dean Eckles, Brian Karrer, and Johan Ugander . 2017. Design and analysis of experiments in networks: Reducing bias from interference. Journal of Causal Inference Vol. 5, 1 (2017).Google ScholarCross Ref
- Jianqing Fan, Fang Han, and Han Liu . 2014. Challenges of big data analysis. National Science Review Vol. 1 (2014), 293--314.Google ScholarCross Ref
- Edgar C Fieller . 1940. The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society Vol. 7, 1 (1940), 1--64.Google Scholar
- Edgar C Fieller . 1954. Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological) (1954), 175--185.Google Scholar
- Ronald Aylmer Fisher . 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character Vol. 222 (1922), 309--368.Google Scholar
- Pedro A Forero, Alfonso Cano, and Georgios B Giannakis . 2010. Consensus-based distributed support vector machines. Journal of Machine Learning Research Vol. 11, May (2010), 1663--1707. Google ScholarDigital Library
- Andrew Gelman and Jennifer Hill . 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.Google Scholar
- Huan Gui, Ya Xu, Anmol Bhasin, and Jiawei Han . 2015. Network A/B testing: From sampling to estimation Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 399--409. Google ScholarDigital Library
- Yu Guo and Alex Deng . 2015. Flexible Online Repeated Measures Experiment. arXiv preprint arXiv:1501.00450 (2015).Google Scholar
- Peter Hall . 2013. The bootstrap and Edgeworth expansion. Springer Science & Business Media.Google Scholar
- Joe Hirschberg and Jenny Lye . 2010. A geometric comparison of the delta and Fieller confidence intervals. The American Statistician Vol. 64 (2010), 234--241.Google ScholarCross Ref
- Michael I Jordan, Jason D Lee, and Yun Yang . 2018. Communication-efficient distributed statistical inference. J. Amer. Statist. Assoc. Vol. in press (2018).Google Scholar
- Eugene Kharitonov, Alexey Drutsa, and Pavel Serdyukov . 2017. Learning sensitive combinations of a/b test metrics Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 651--659. Google ScholarDigital Library
- Neil Klar and Allan Donner . 2001. Current and future challenges in the design and analysis of cluster randomization trials. Statistics in medicine Vol. 20, 24 (2001), 3729--3740.Google Scholar
- Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, and Michael I Jordan . 2014. A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Vol. 76, 4 (2014), 795--816.Google ScholarCross Ref
- Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed . 2009 a. Online experimentation at Microsoft. In Proceedings of the Third International Workshop on Data Mining Case Studies, held at the 5th ACM SIGKDD Conference. 11--23.Google Scholar
- Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann . 2013. Online Controlled Experiments at Large Scale. Proceedings of the 19th ACM SIGKDD Conference (2013). Google ScholarDigital Library
- Ron Kohavi, Randal M Henne, and Dan Sommerfield . 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD Conference. 959--967. Google ScholarDigital Library
- Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne . 2009 b. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery Vol. 18, 1 (2009), 140--181. Google ScholarDigital Library
- R. Kohavi, R. Longbotham, and T. Walker . 2010. Online Experiments: Practical Lessons. Computer Vol. 43, 9 (Sept . 2010), 82--85. Google ScholarDigital Library
- Daniel Krewski . 1976. Distribution-free confidence intervals for quantile intervals. J. Amer. Statist. Assoc. Vol. 71, 354 (1976), 420--422.Google ScholarCross Ref
- Kung-Yee Liang and Scott L Zeger . 1986. Longitudinal data analysis using generalized linear models. Biometrika Vol. 73, 1 (1986), 13--22.Google ScholarCross Ref
- John S Meyer . 1987. Outer and inner confidence intervals for finite population quantile intervals. J. Amer. Statist. Assoc. Vol. 82, 397 (1987), 201--204.Google ScholarCross Ref
- Walter Rudin et almbox. . 1964. Principles of mathematical analysis. Vol. Vol. 3. McGraw-hill New York.Google Scholar
- Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer . 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings of the 16th ACM SIGKDD Conference (2010). Google ScholarDigital Library
- Aad W Van der Vaart . 2000. Asymptotic statistics. Vol. Vol. 3. Cambridge university press.Google Scholar
- Ulrike Von Luxburg and Volker H Franz . 2009. A geometric approach to confidence sets for ratios: Fieller's theorem, generalizations and bootstrap. Statistica Sinica (2009), 1095--1117.Google Scholar
- Dongli Wang and Yan Zhou . 2012. Distributed support vector machines: An overview. In Control and Decision Conference (CCDC), 2012 24th Chinese. IEEE, 3897--3901.Google Scholar
- Larry Wasserman . 2003. All of Statistics: A Concise Course in Statistical Inference. Springer. Google ScholarDigital Library
- Huizhi Xie and Juliette Aurisset . 2016. Improving the sensitivity of online controlled experiments: Case studies at netflix. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 645--654. Google ScholarDigital Library
- Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin . 2015. From infrastructure to culture: A/B testing challenges in large scale social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2227--2236. Google ScholarDigital Library
- Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et almbox. . 2016. Apache Spark: A unified engine for big data processing. Commun. ACM Vol. 59, 11 (2016), 56--65. Google ScholarDigital Library
- Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola . 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems. 2595--2603. Google ScholarDigital Library
Index Terms
- Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas
Recommendations
Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningOnline controlled experiments, also called A/B testing, have been established as the mantra for data-driven decision making in many web-facing companies. In recent years, there are emerging research works focusing on building the platform and scaling it ...
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data ResearchThis paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningThe SDA workshop at WSDM 2015 is the fifth International Workshop on Scalable Data Analytics, following the previous four workshops of SDA respectively held at IEEE Big Data 2013, PAKDD 2014, IEEE Big Data 2014, and IEEE ICDM 2014. This series of ...
Comments