research-article

Simple and Scalable Response Prediction for Display Advertising

Authors:
Olivier Chapelle

Criteo

Criteo
View Profile

,
Eren Manavoglu

Microsoft

Microsoft
View Profile

,
Romer Rosales

LinkedIn

LinkedIn
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 5 Issue 4Article No.: 61pp 1–34https://doi.org/10.1145/2532128

Published:29 December 2014Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Clickthrough and conversation rates estimation are two core predictions tasks in display advertising. We present in this article a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: It is easy to implement and deploy, it is highly scalable (we have trained it on terabytes of data), and it provides models with state-of-the-art accuracy.

References

A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. 2011. A reliable effective terascale linear learning system. CoRR abs/1110.4198 (2011).Google Scholar
D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. 2010. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 213--222. Google ScholarDigital Library
Azin Ashkan, Charles L. A. Clarke, Eugene Agichtein, and Qi Guo. 2009. Estimating ad clickthrough rate through query intent analysis. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Google ScholarDigital Library
P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 2 (2002), 235--256. Google ScholarDigital Library
Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. 2011. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1 (2011), 1--106. Google ScholarDigital Library
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc. Google ScholarDigital Library
B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (1970), 422--426. Google ScholarDigital Library
K. Canini, T. Chandra, E. Ie, J. McFadden, K. Goldman, M. Gunter, J. Harmsen, K. LeFevre, D. Lepikhin, T. L. Llinares, I. Mukherjee, F. Pereira, J. Redstone, T. Shaked, and Y. Singer. 2012. Sibyl: A system for large scale supervised machine learning. (2012). Presentation at MLSS Santa Cruz, http://users.soe.ucsc.edu/niejiazhong/slides/chandra.pdf.Google Scholar
D. Chakrabarti, D. Agarwal, and V. Josifovski. 2008. Contextual advertising by combining relevance with click feedback. In Proceedings of the 17th International Conference on World Wide Web. 417--426. Google ScholarDigital Library
Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. The Journal of Machine Learning Research 11 (2010), 1471--1490. Google ScholarDigital Library
O. Chapelle and L. Li. 2011. An empirical evaluation of thompson sampling. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira, and K. Q. Weinberger (Eds.). 2249--2257.Google Scholar
S. F. Chen and J. Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language 13, 4 (1999), 359--393.Google ScholarDigital Library
Haibin Cheng and Erick Cantú-Paz. 2010. Personalized click prediction in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 351--360. Google ScholarDigital Library
Haibin Cheng, Roelof van Zwol, Javad Azimi, Eren Manavoglu, Ruofei Zhang, Yang Zhou, and Vidhya Navalpakkam. 2012. Multimedia features for click prediction of new ads in display advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 777--785. Google ScholarDigital Library
C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. 2007. Map-reduce for machine learning on multicore. In Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, Vol. 19.Google Scholar
Massimiliano Ciaramita, Vanessa Murdock, and Vassilis Plachouras. 2008. Online learning from click data for sponsored search. In Proceedings of the 17th International Conference on World Wide Web. 227--236. Google ScholarDigital Library
C. Cortes, Y. Mansour, and M. Mohri. 2010. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems, Vol. 23. 442--450.Google Scholar
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
John Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2010), 2121--2159. Google ScholarDigital Library
Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi-task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 109--117. Google ScholarDigital Library
A. Gelman and J. Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.Google Scholar
John C. Gittins. 1989. Multi-armed Bandit Allocation Indices. John Wiley & Sons.Google Scholar
T. Graepel, J. Quinonero Candela, T. Borchert, and R. Herbrich. 2010. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine. In Proceedings of the 27th International Conference on Machine Learning. 13--20.Google Scholar
I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3 (2003), 1157--1182. Google ScholarDigital Library
D. Hillard, E. Manavoglu, H. Raghavan, C. Leggetter, E. Cantú-Paz, and R. Iyer. 2011. The sum of its parts: Reducing sparsity in click estimation with query segments. Information Retrieval (2011), 1--22. Google ScholarDigital Library
D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. 2010. Improving ad relevance in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 361--370. Google ScholarDigital Library
Michael Kearns. 1993. Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computing. 392--401. Google ScholarDigital Library
G. King and L. Zeng. 2001. Logistic regression in rare events data. Political Analysis 9, 2 (2001), 137--163.Google ScholarCross Ref
H. Koepke and M. Bilenko. 2012. Fast prediction of new feature utility. In Proceedings of the 29th International Conference on Machine Learning. 791--798.Google Scholar
Nagaraj Kota and Deepak Agarwal. 2011. Temporal multi-hierarchy smoothing for estimating rates of rare events. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1361--1369. Google ScholarDigital Library
T. L. Lai and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6 (1985), 4--22. Google ScholarDigital Library
J. Langford, L. Li, and A. Strehl. 2007. Vowpal Wabbit Open Source Project. https://github.com/JohnLangford/vowpal_wabbit/wiki. (2007).Google Scholar
L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web. 661--670. Google ScholarDigital Library
L. Li, W. Chu, J. Langford, and X. Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 297--306. Google ScholarDigital Library
Yandong Liu, Sandeep Pandey, Deepak Agarwal, and Vanja Josifovski. 2012. Finding the right consumer: Optimizing for conversion in display advertising campaigns. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 473--482. Google ScholarDigital Library
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new framework for parallel machine learning. In The 26th Conference on Uncertainty in Artificial Intelligence.Google Scholar
James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281--297.Google Scholar
R. P. McAfee. 2011. The design of advertising exchanges. Review of Industrial Organization (2011), 1--17.Google Scholar
H. B. McMahan and M. Streeter. 2010. Adaptive bound optimization for online convex optimization. In Proceedings of the 23rd Annual Conference on Learning Theory. 244--256.Google Scholar
C. Meek, D. M. Chickering, and D. Wilson. 2005. Stochastic and contingent payment auctions. In Workshop on Sponsored Search Auctions, ACM Electronic Commerce.Google Scholar
Lukas Meier, Sara Van De Geer, and Peter Bühlmann. 2008. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 1 (2008), 53--71.Google ScholarCross Ref
S. Menard. 2001. Applied Logistic Regression Analysis. Vol. 106. Sage Publications, Inc.Google Scholar
Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota. 2011. Response prediction using collaborative filtering with hierarchies and side-information. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 141--149. Google ScholarDigital Library
T. P. Minka. 2003. A Comparison of Numerical Optimizers for Logistic Regression. Technical Report. Microsoft Research. Retrieved from http://research.microsoft.com/en-us/um/people/minka/papers/logreg/.Google Scholar
S. Muthukrishnan. 2009. Ad exchanges: Research issues. In Proceedings of the 5th International Workshop on Internet and Network Economics. 1--12. Google ScholarDigital Library
K. Nigam, J. Lafferty, and A. McCallum. 1999. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, Vol. 1. 61--67.Google Scholar
J. Nocedal. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation 35, 151 (1980), 773--782.Google ScholarCross Ref
A. B. Owen. 2007. Infinitely imbalanced logistic regression. The Journal of Machine Learning Research 8 (2007), 761--773. Google ScholarDigital Library
Moira Regelson and Daniel C. Fain. 2006. Predicting click-through rate using keyword clusters. In Proceedings of the Second Workshop on Sponsored Search Auctions.Google Scholar
M. Richardson, E. Dominowska, and R. Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web. New York, NY, 521--530. Google ScholarDigital Library
R. Rosales and O. Chapelle. 2011. Attribute selection by measuring information on reference distributions. In Tech Pulse Conference, Yahoo&excl;. Retrieved from http://people.csail.mit.edu/romer/papers/RosChaTP11.pdf.Google Scholar
R. Rosales, H. Cheng, and E. Manavoglu. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 293--302. Google ScholarDigital Library
J. Sarkar. 1991. One-armed bandit problems with covariates. The Annals of Statistics (1991), 1978--2002.Google ScholarCross Ref
B. Schölkopf and A. J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. Google ScholarDigital Library
Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, and SVN Vishwanathan. 2009. Hash kernels for structured data. The Journal of Machine Learning Research 10 (2009), 2615--2637. Google ScholarDigital Library
C. Teo, Q. Le, A. Smola, and SVN Vishwanathan. 2007. A scalable modular convex solver for regularized risk minimization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 727--736. Google ScholarDigital Library
William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.Google ScholarCross Ref
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 1113--1120. Google ScholarDigital Library
Jerry Ye, Jyh-Herng Chow, Jiang Chen, and Zhaohui Zheng. 2009. Stochastic gradient boosted distributed decision trees. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. 2061--2064. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15--28. Google ScholarDigital Library

Index Terms

Simple and Scalable Response Prediction for Display Advertising

Recommendations

Multimedia features for click prediction of new ads in display advertising
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Non-guaranteed display advertising (NGD) is a multi-billion dollar business that has been growing rapidly in recent years. Advertisers in NGD sell a large portion of their ad campaigns using performance dependent pricing models such as cost-per-click (...
Read More
A Practical Framework of Conversion Rate Prediction for Online Display Advertising
ADKDD'17: Proceedings of the ADKDD'17

Cost-per-action (CPA), or cost-per-acquisition, has become the primary campaign performance objective in online advertising industry. As a result, accurate conversion rate (CVR) prediction is crucial for any real-time bidding (RTB) platform. However, ...
Read More
Introduction to display advertising: a half-day tutorial
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining

Display advertising is one of the two major advertising channels on the web (in addition to search advertising). Display advertising on the Web is usually done by graphical ads placed on the publishers' Web pages. There is no explicit user query, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 5, Issue 4
Special Sections on Diversity and Discovery in Recommender Systems, Online Advertising and Regular Papers
January 2015
390 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2699158
Editor:
Huan Liu
Arizona State University, USA
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 December 2014
- Accepted: 1 August 2013
- Revised: 1 April 2013
- Received: 1 December 2012
Published in tist Volume 5, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Display advertising
click prediction
distributed learning
feature selection
hashing
machine learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 186
  Total Citations
  View Citations
- 1,720
  Total Downloads
- Downloads (Last 12 months)84
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Simple and Scalable Response Prediction for Display Advertising

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Multimedia features for click prediction of new ads in display advertising

A Practical Framework of Conversion Rate Prediction for Online Display Advertising

Introduction to display advertising: a half-day tutorial