research-article

Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas

Authors:
Alex Deng

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Ulf Knoblich

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

,
Jiannan Lu

Microsoft Corporation, Redmond, WA, USA

Microsoft Corporation, Redmond, WA, USA
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 233–242https://doi.org/10.1145/3219819.3219919

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 233–242

ABSTRACT

During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such metrics approximately follow normal distributions, opening up potential opportunities to model them directly without extra model assumptions and solve big data problems via closed-form formulas using distributed algorithms at a fraction of the cost of simulation-based procedures like bootstrap. However, certain attributes of the metrics, such as their corresponding data generating processes and aggregation levels, pose numerous challenges for constructing trustworthy estimation and inference procedures. Motivated by four real-life examples in metric development and analytics for large-scale A/B testing, we provide a practical guide to applying the Delta method, one of the most important tools from the classic statistics literature, to address the aforementioned challenges. We emphasize the central role of the Delta method in metric analytics by highlighting both its classic and novel applications.

Supplemental Material

lu_metric_analytics.mp4

mp4

363.9 MB

Download

References

Susan Athey and Guido W Imbens . 2017. The econometrics of randomized experiments. Handbook of Economic Field Experiments Vol. 1 (2017), 73--140.Google ScholarCross Ref
Lars Backstrom and Jon Kleinberg . 2011. Network bucket testing. In Proceedings of the 20th international conference on World wide web. ACM, 615--624. Google ScholarDigital Library
Douglas Bates, Martin M"achler, Ben Bolker, and Steve Walker . 2014 a. Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823 (2014).Google Scholar
Douglas Bates, Martin Maechler, Ben Bolker, Steven Walker, et almbox. . 2014 b. lme4: Linear mixed-effects models using Eigen and S4. R package version Vol. 1, 7 (2014), 1--23.Google Scholar
Dennis D Boos and Jacqueline M Hughes-Oliver . 2000. How large does n have to be for Z and t intervals The American Statistician Vol. 54, 2 (2000), 121--128.Google Scholar
Léon Bottou . 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.Google ScholarCross Ref
Morton B Brown and Robert A Wolfe . 1983. Estimation of the variance of percentile estimates. Computational Statistics & Data Analysis Vol. 1 (1983), 167--174. Google ScholarDigital Library
Roman Budylin, Alexey Drutsa, Ilya Katsev, and Valeriya Tsoy . 2018. Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 55--63. Google ScholarDigital Library
Bob Carpenter, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell . 2016. Stan: A probabilistic programming language. Journal of Statistical Software Vol. 20 (2016), 1--37.Google Scholar
George Casella and Roger L Berger . 2002. Statistical Inference, Second Edition. Duxbury Press: Pacific Grove, CA.Google Scholar
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou . 2008. SCOPE: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment Vol. 1 (2008), 1265--1276. Google ScholarDigital Library
Corinna Cortes and Vladimir Vapnik . 1995. Support-vector networks. Machine Learning Vol. 20 (1995), 273--297. Google ScholarDigital Library
M. Davidian, A.A. Tsiatis, and S. Leon . 2005. Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. Statist. Sci. Vol. 20 (2005), 295--301. Issue 3.Google ScholarCross Ref
A. Deng, J. Lu, and J. Litz . 2017. Trustworthy analysis of online A/B tests: Pitfalls, challenges and solutions Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 641--649. Google ScholarDigital Library
A. Deng and X. Shi . 2016. Data-driven metric development for online controlled experiments: Seven lessons learned. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker . 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the 6th ACM WSDM Conference. 123--132. Google ScholarDigital Library
Pavel Dmitriev, Somit Gupta, Dong Woo Kim, and Garnet Vaz . 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '17). ACM, New York, NY, USA, 1427--1436. Google ScholarDigital Library
Pavel Dmitriev and Xian Wu . 2016. Measuring Metrics. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 429--437. Google ScholarDigital Library
Allan Donner . 1987. Statistical methodology for paired cluster designs. American Journal of Epidemiology Vol. 126, 5 (1987), 972--979.Google ScholarCross Ref
Dean Eckles, Brian Karrer, and Johan Ugander . 2017. Design and analysis of experiments in networks: Reducing bias from interference. Journal of Causal Inference Vol. 5, 1 (2017).Google ScholarCross Ref
Jianqing Fan, Fang Han, and Han Liu . 2014. Challenges of big data analysis. National Science Review Vol. 1 (2014), 293--314.Google ScholarCross Ref
Edgar C Fieller . 1940. The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society Vol. 7, 1 (1940), 1--64.Google Scholar
Edgar C Fieller . 1954. Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological) (1954), 175--185.Google Scholar
Ronald Aylmer Fisher . 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character Vol. 222 (1922), 309--368.Google Scholar
Pedro A Forero, Alfonso Cano, and Georgios B Giannakis . 2010. Consensus-based distributed support vector machines. Journal of Machine Learning Research Vol. 11, May (2010), 1663--1707. Google ScholarDigital Library
Andrew Gelman and Jennifer Hill . 2006. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.Google Scholar
Huan Gui, Ya Xu, Anmol Bhasin, and Jiawei Han . 2015. Network A/B testing: From sampling to estimation Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 399--409. Google ScholarDigital Library
Yu Guo and Alex Deng . 2015. Flexible Online Repeated Measures Experiment. arXiv preprint arXiv:1501.00450 (2015).Google Scholar
Peter Hall . 2013. The bootstrap and Edgeworth expansion. Springer Science & Business Media.Google Scholar
Joe Hirschberg and Jenny Lye . 2010. A geometric comparison of the delta and Fieller confidence intervals. The American Statistician Vol. 64 (2010), 234--241.Google ScholarCross Ref
Michael I Jordan, Jason D Lee, and Yun Yang . 2018. Communication-efficient distributed statistical inference. J. Amer. Statist. Assoc. Vol. in press (2018).Google Scholar
Eugene Kharitonov, Alexey Drutsa, and Pavel Serdyukov . 2017. Learning sensitive combinations of a/b test metrics Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 651--659. Google ScholarDigital Library
Neil Klar and Allan Donner . 2001. Current and future challenges in the design and analysis of cluster randomization trials. Statistics in medicine Vol. 20, 24 (2001), 3729--3740.Google Scholar
Ariel Kleiner, Ameet Talwalkar, Purnamrita Sarkar, and Michael I Jordan . 2014. A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) Vol. 76, 4 (2014), 795--816.Google ScholarCross Ref
Ronny Kohavi, Thomas Crook, Roger Longbotham, Brian Frasca, Randy Henne, Juan Lavista Ferres, and Tamir Melamed . 2009 a. Online experimentation at Microsoft. In Proceedings of the Third International Workshop on Data Mining Case Studies, held at the 5th ACM SIGKDD Conference. 11--23.Google Scholar
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann . 2013. Online Controlled Experiments at Large Scale. Proceedings of the 19th ACM SIGKDD Conference (2013). Google ScholarDigital Library
Ron Kohavi, Randal M Henne, and Dan Sommerfield . 2007. Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In Proceedings of the 13th ACM SIGKDD Conference. 959--967. Google ScholarDigital Library
Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M Henne . 2009 b. Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery Vol. 18, 1 (2009), 140--181. Google ScholarDigital Library
R. Kohavi, R. Longbotham, and T. Walker . 2010. Online Experiments: Practical Lessons. Computer Vol. 43, 9 (Sept . 2010), 82--85. Google ScholarDigital Library
Daniel Krewski . 1976. Distribution-free confidence intervals for quantile intervals. J. Amer. Statist. Assoc. Vol. 71, 354 (1976), 420--422.Google ScholarCross Ref
Kung-Yee Liang and Scott L Zeger . 1986. Longitudinal data analysis using generalized linear models. Biometrika Vol. 73, 1 (1986), 13--22.Google ScholarCross Ref
John S Meyer . 1987. Outer and inner confidence intervals for finite population quantile intervals. J. Amer. Statist. Assoc. Vol. 82, 397 (1987), 201--204.Google ScholarCross Ref
Walter Rudin et almbox. . 1964. Principles of mathematical analysis. Vol. Vol. 3. McGraw-hill New York.Google Scholar
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer . 2010. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings of the 16th ACM SIGKDD Conference (2010). Google ScholarDigital Library
Aad W Van der Vaart . 2000. Asymptotic statistics. Vol. Vol. 3. Cambridge university press.Google Scholar
Ulrike Von Luxburg and Volker H Franz . 2009. A geometric approach to confidence sets for ratios: Fieller's theorem, generalizations and bootstrap. Statistica Sinica (2009), 1095--1117.Google Scholar
Dongli Wang and Yan Zhou . 2012. Distributed support vector machines: An overview. In Control and Decision Conference (CCDC), 2012 24th Chinese. IEEE, 3897--3901.Google Scholar
Larry Wasserman . 2003. All of Statistics: A Concise Course in Statistical Inference. Springer. Google ScholarDigital Library
Huizhi Xie and Juliette Aurisset . 2016. Improving the sensitivity of online controlled experiments: Case studies at netflix. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 645--654. Google ScholarDigital Library
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin . 2015. From infrastructure to culture: A/B testing challenges in large scale social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2227--2236. Google ScholarDigital Library
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et almbox. . 2016. Apache Spark: A unified engine for big data processing. Commun. ACM Vol. 59, 11 (2016), 56--65. Google ScholarDigital Library
Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J Smola . 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems. 2595--2603. Google ScholarDigital Library

Index Terms

Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas
1. Mathematics of computing
  1. Probability and statistics

Recommendations

Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online controlled experiments, also called A/B testing, have been established as the mantra for data-driven decision making in many web-facing companies. In recent years, there are emerging research works focusing on building the platform and scaling it ...
Read More
Responsible Big Data Analytics for E-Business Services
ICBDR '21: Proceedings of the 5th International Conference on Big Data Research

This paper examines responsible big data analytics for e-business services and looks at how to use responsible big data analytics to obtain responsible e-business services. It addresses why responsibility matters to big data analytics and e-business ...
Read More
WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

The SDA workshop at WSDM 2015 is the fifth International Workshop on Scalable Data Analytics, following the previous four workshops of SDA respectively held at IEEE Big Data 2013, PAKDD 2014, IEEE Big Data 2014, and IEEE ICDM 2014. This series of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
a/b testing
big data
distributed algorithm
large sample theory
longitudinal study
online metrics
quantile inference
randomization
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 977
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned

Responsible Big Data Analytics for E-Business Services

WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned

Responsible Big Data Analytics for E-Business Services

WSDM'15 Workshop Summary / Scalable Data Analytics: Theory and Applications

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media