skip to main content
10.1145/3219819.3219919acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas

Published:19 July 2018Publication History

ABSTRACT

During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such metrics approximately follow normal distributions, opening up potential opportunities to model them directly without extra model assumptions and solve big data problems via closed-form formulas using distributed algorithms at a fraction of the cost of simulation-based procedures like bootstrap. However, certain attributes of the metrics, such as their corresponding data generating processes and aggregation levels, pose numerous challenges for constructing trustworthy estimation and inference procedures. Motivated by four real-life examples in metric development and analytics for large-scale A/B testing, we provide a practical guide to applying the Delta method, one of the most important tools from the classic statistics literature, to address the aforementioned challenges. We emphasize the central role of the Delta method in metric analytics by highlighting both its classic and novel applications.

Skip Supplemental Material Section

Supplemental Material

lu_metric_analytics.mp4

mp4

363.9 MB

References

  1. Susan Athey and Guido W Imbens . 2017. The econometrics of randomized experiments. Handbook of Economic Field Experiments Vol. 1 (2017), 73--140.Google ScholarGoogle ScholarCross RefCross Ref
  2. Lars Backstrom and Jon Kleinberg . 2011. Network bucket testing. In Proceedings of the 20th international conference on World wide web. ACM, 615--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Douglas Bates, Martin M"achler, Ben Bolker, and Steve Walker . 2014 a. Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014).Google ScholarGoogle Scholar
  4. Douglas Bates, Martin Maechler, Ben Bolker, Steven Walker, et almbox. . 2014 b. lme4: Linear mixed-effects models using Eigen and S4. R package version Vol. 1, 7 (2014), 1--23.Google ScholarGoogle Scholar
  5. Dennis D Boos and Jacqueline M Hughes-Oliver . 2000. How large does n have to be for Z and t intervals The American Statistician Vol. 54, 2 (2000), 121--128.Google ScholarGoogle Scholar
  6. Léon Bottou . 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.Google ScholarGoogle ScholarCross RefCross Ref
  7. Morton B Brown and Robert A Wolfe . 1983. Estimation of the variance of percentile estimates. Computational Statistics & Data Analysis Vol. 1 (1983), 167--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Roman Budylin, Alexey Drutsa, Ilya Katsev, and Valeriya Tsoy . 2018. Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 55--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell . 2016. Stan: A probabilistic programming language. Journal of Statistical Software Vol. 20 (2016), 1--37.Google ScholarGoogle Scholar
  10. George Casella and Roger L Berger . 2002. Statistical Inference, Second Edition. Duxbury Press: Pacific Grove, CA.Google ScholarGoogle Scholar
  11. Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou . 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment Vol. 1 (2008), 1265--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Corinna Cortes and Vladimir Vapnik . 1995. Support-vector networks. Machine Learning Vol. 20 (1995), 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Davidian, A.A. Tsiatis, and S. Leon . 2005. Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. Statist. Sci. Vol. 20 (2005), 295--301. Issue 3.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Deng, J. Lu, and J. Litz . 2017. Trustworthy analysis of online A/B tests: Pitfalls, challenges and solutions Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 641--649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Deng and X. Shi . 2016. Data-driven metric development for online controlled experiments: Seven lessons learned. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker . 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the 6th ACM WSDM Conference. 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pavel Dmitriev, Somit Gupta, Dong Woo Kim, and Garnet Vaz . 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '17). ACM, New York, NY, USA, 1427--1436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Pavel Dmitriev and Xian Wu . 2016. Measuring Metrics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 429--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Allan Donner . 1987. Statistical methodology for paired cluster designs. American Journal of Epidemiology Vol. 126, 5 (1987), 972--979.Google ScholarGoogle ScholarCross RefCross Ref
  20. Dean Eckles, Brian Karrer, and Johan Ugander . 2017. Design and analysis of experiments in networks: Reducing bias from interference. Journal of Causal Inference Vol. 5, 1 (2017).Google ScholarGoogle ScholarCross RefCross Ref
  21. Jianqing Fan, Fang Han, and Han Liu . 2014. Challenges of big data analysis. National Science Review Vol. 1 (2014), 293--314.Google ScholarGoogle ScholarCross RefCross Ref
  22. Edgar C Fieller . 1940. The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society Vol. 7, 1 (1940), 1--64.Google ScholarGoogle Scholar
  23. Edgar C Fieller . 1954. Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological) (1954), 175--185.Google ScholarGoogle Scholar
  24. Ronald Aylmer Fisher . 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character Vol. 222 (1922), 309--368.Google ScholarGoogle Scholar
  25. Pedro A Forero, Alfonso Cano, and Georgios B Giannakis . 2010. Consensus-based distributed support vector machines. Journal of Machine Learning Research Vol. 11, May (2010), 1663--1707. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andrew Gelman and Jennifer Hill . 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.Google ScholarGoogle Scholar
  27. Huan Gui, Ya Xu, Anmol Bhasin, and Jiawei Han . 2015. Network A/B testing: From sampling to estimation Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 399--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yu Guo and Alex Deng . 2015. Flexible Online Repeated Measures Experiment. arXiv preprint arXiv:1501.00450 (2015).Google ScholarGoogle Scholar
  29. Peter Hall . 2013. The bootstrap and Edgeworth expansion. Springer Science & Business Media.Google ScholarGoogle Scholar
  30. Joe Hirschberg and Jenny Lye . 2010. A geometric comparison of the delta and Fieller confidence intervals. The American Statistician Vol. 64 (2010), 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  31. Michael I Jordan, Jason D Lee, and Yun Yang . 2018. Communication-efficient distributed statistical inference. J. Amer. Statist. Assoc. Vol. in press (2018).Google ScholarGoogle Scholar
  32. Eugene Kharitonov, Alexey Drutsa, and Pavel Serdyukov . 2017. Learning sensitive combinations of a/b test metrics Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 651--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Neil Klar and Allan Donner . 2001. Current and future challenges in the design and analysis of cluster randomization trials. Statistics in medicine Vol. 20, 24 (2001), 3729--3740.Google ScholarGoogle Scholar
  34. Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, and Michael I Jordan . 2014. A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Vol. 76, 4 (2014), 795--816.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed . 2009 a. Online experimentation at Microsoft. In Proceedings of the Third International Workshop on Data Mining Case Studies, held at the 5th ACM SIGKDD Conference. 11--23.Google ScholarGoogle Scholar
  36. Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann . 2013. Online Controlled Experiments at Large Scale. Proceedings of the 19th ACM SIGKDD Conference (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ron Kohavi, Randal M Henne, and Dan Sommerfield . 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD Conference. 959--967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne . 2009 b. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery Vol. 18, 1 (2009), 140--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Kohavi, R. Longbotham, and T. Walker . 2010. Online Experiments: Practical Lessons. Computer Vol. 43, 9 (Sept . 2010), 82--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Daniel Krewski . 1976. Distribution-free confidence intervals for quantile intervals. J. Amer. Statist. Assoc. Vol. 71, 354 (1976), 420--422.Google ScholarGoogle ScholarCross RefCross Ref
  41. Kung-Yee Liang and Scott L Zeger . 1986. Longitudinal data analysis using generalized linear models. Biometrika Vol. 73, 1 (1986), 13--22.Google ScholarGoogle ScholarCross RefCross Ref
  42. John S Meyer . 1987. Outer and inner confidence intervals for finite population quantile intervals. J. Amer. Statist. Assoc. Vol. 82, 397 (1987), 201--204.Google ScholarGoogle ScholarCross RefCross Ref
  43. Walter Rudin et almbox. . 1964. Principles of mathematical analysis. Vol. Vol. 3. McGraw-hill New York.Google ScholarGoogle Scholar
  44. Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer . 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings of the 16th ACM SIGKDD Conference (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Aad W Van der Vaart . 2000. Asymptotic statistics. Vol. Vol. 3. Cambridge university press.Google ScholarGoogle Scholar
  46. Ulrike Von Luxburg and Volker H Franz . 2009. A geometric approach to confidence sets for ratios: Fieller's theorem, generalizations and bootstrap. Statistica Sinica (2009), 1095--1117.Google ScholarGoogle Scholar
  47. Dongli Wang and Yan Zhou . 2012. Distributed support vector machines: An overview. In Control and Decision Conference (CCDC), 2012 24th Chinese. IEEE, 3897--3901.Google ScholarGoogle Scholar
  48. Larry Wasserman . 2003. All of Statistics: A Concise Course in Statistical Inference. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Huizhi Xie and Juliette Aurisset . 2016. Improving the sensitivity of online controlled experiments: Case studies at netflix. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 645--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin . 2015. From infrastructure to culture: A/B testing challenges in large scale social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2227--2236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et almbox. . 2016. Apache Spark: A unified engine for big data processing. Commun. ACM Vol. 59, 11 (2016), 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola . 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems. 2595--2603. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2018
      2925 pages
      ISBN:9781450355520
      DOI:10.1145/3219819

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader