Abstract
Clickthrough and conversation rates estimation are two core predictions tasks in display advertising. We present in this article a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: It is easy to implement and deploy, it is highly scalable (we have trained it on terabytes of data), and it provides models with state-of-the-art accuracy.
- A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. 2011. A reliable effective terascale linear learning system. CoRR abs/1110.4198 (2011).Google Scholar
- D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. 2010. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 213--222. Google ScholarDigital Library
- Azin Ashkan, Charles L. A. Clarke, Eugene Agichtein, and Qi Guo. 2009. Estimating ad clickthrough rate through query intent analysis. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Google ScholarDigital Library
- P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 2 (2002), 235--256. Google ScholarDigital Library
- Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. 2011. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1 (2011), 1--106. Google ScholarDigital Library
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc. Google ScholarDigital Library
- B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (1970), 422--426. Google ScholarDigital Library
- K. Canini, T. Chandra, E. Ie, J. McFadden, K. Goldman, M. Gunter, J. Harmsen, K. LeFevre, D. Lepikhin, T. L. Llinares, I. Mukherjee, F. Pereira, J. Redstone, T. Shaked, and Y. Singer. 2012. Sibyl: A system for large scale supervised machine learning. (2012). Presentation at MLSS Santa Cruz, http://users.soe.ucsc.edu/niejiazhong/slides/chandra.pdf.Google Scholar
- D. Chakrabarti, D. Agarwal, and V. Josifovski. 2008. Contextual advertising by combining relevance with click feedback. In Proceedings of the 17th International Conference on World Wide Web. 417--426. Google ScholarDigital Library
- Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. The Journal of Machine Learning Research 11 (2010), 1471--1490. Google ScholarDigital Library
- O. Chapelle and L. Li. 2011. An empirical evaluation of thompson sampling. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira, and K. Q. Weinberger (Eds.). 2249--2257.Google Scholar
- S. F. Chen and J. Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language 13, 4 (1999), 359--393.Google ScholarDigital Library
- Haibin Cheng and Erick Cantú-Paz. 2010. Personalized click prediction in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 351--360. Google ScholarDigital Library
- Haibin Cheng, Roelof van Zwol, Javad Azimi, Eren Manavoglu, Ruofei Zhang, Yang Zhou, and Vidhya Navalpakkam. 2012. Multimedia features for click prediction of new ads in display advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 777--785. Google ScholarDigital Library
- C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. 2007. Map-reduce for machine learning on multicore. In Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, Vol. 19.Google Scholar
- Massimiliano Ciaramita, Vanessa Murdock, and Vassilis Plachouras. 2008. Online learning from click data for sponsored search. In Proceedings of the 17th International Conference on World Wide Web. 227--236. Google ScholarDigital Library
- C. Cortes, Y. Mansour, and M. Mohri. 2010. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems, Vol. 23. 442--450.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- John Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2010), 2121--2159. Google ScholarDigital Library
- Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi-task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 109--117. Google ScholarDigital Library
- A. Gelman and J. Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.Google Scholar
- John C. Gittins. 1989. Multi-armed Bandit Allocation Indices. John Wiley & Sons.Google Scholar
- T. Graepel, J. Quinonero Candela, T. Borchert, and R. Herbrich. 2010. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine. In Proceedings of the 27th International Conference on Machine Learning. 13--20.Google Scholar
- I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3 (2003), 1157--1182. Google ScholarDigital Library
- D. Hillard, E. Manavoglu, H. Raghavan, C. Leggetter, E. Cantú-Paz, and R. Iyer. 2011. The sum of its parts: Reducing sparsity in click estimation with query segments. Information Retrieval (2011), 1--22. Google ScholarDigital Library
- D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. 2010. Improving ad relevance in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 361--370. Google ScholarDigital Library
- Michael Kearns. 1993. Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computing. 392--401. Google ScholarDigital Library
- G. King and L. Zeng. 2001. Logistic regression in rare events data. Political Analysis 9, 2 (2001), 137--163.Google ScholarCross Ref
- H. Koepke and M. Bilenko. 2012. Fast prediction of new feature utility. In Proceedings of the 29th International Conference on Machine Learning. 791--798.Google Scholar
- Nagaraj Kota and Deepak Agarwal. 2011. Temporal multi-hierarchy smoothing for estimating rates of rare events. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1361--1369. Google ScholarDigital Library
- T. L. Lai and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6 (1985), 4--22. Google ScholarDigital Library
- J. Langford, L. Li, and A. Strehl. 2007. Vowpal Wabbit Open Source Project. https://github.com/JohnLangford/vowpal_wabbit/wiki. (2007).Google Scholar
- L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web. 661--670. Google ScholarDigital Library
- L. Li, W. Chu, J. Langford, and X. Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 297--306. Google ScholarDigital Library
- Yandong Liu, Sandeep Pandey, Deepak Agarwal, and Vanja Josifovski. 2012. Finding the right consumer: Optimizing for conversion in display advertising campaigns. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 473--482. Google ScholarDigital Library
- Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new framework for parallel machine learning. In The 26th Conference on Uncertainty in Artificial Intelligence.Google Scholar
- James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281--297.Google Scholar
- R. P. McAfee. 2011. The design of advertising exchanges. Review of Industrial Organization (2011), 1--17.Google Scholar
- H. B. McMahan and M. Streeter. 2010. Adaptive bound optimization for online convex optimization. In Proceedings of the 23rd Annual Conference on Learning Theory. 244--256.Google Scholar
- C. Meek, D. M. Chickering, and D. Wilson. 2005. Stochastic and contingent payment auctions. In Workshop on Sponsored Search Auctions, ACM Electronic Commerce.Google Scholar
- Lukas Meier, Sara Van De Geer, and Peter Bühlmann. 2008. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 1 (2008), 53--71.Google ScholarCross Ref
- S. Menard. 2001. Applied Logistic Regression Analysis. Vol. 106. Sage Publications, Inc.Google Scholar
- Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota. 2011. Response prediction using collaborative filtering with hierarchies and side-information. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 141--149. Google ScholarDigital Library
- T. P. Minka. 2003. A Comparison of Numerical Optimizers for Logistic Regression. Technical Report. Microsoft Research. Retrieved from http://research.microsoft.com/en-us/um/people/minka/papers/logreg/.Google Scholar
- S. Muthukrishnan. 2009. Ad exchanges: Research issues. In Proceedings of the 5th International Workshop on Internet and Network Economics. 1--12. Google ScholarDigital Library
- K. Nigam, J. Lafferty, and A. McCallum. 1999. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, Vol. 1. 61--67.Google Scholar
- J. Nocedal. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation 35, 151 (1980), 773--782.Google ScholarCross Ref
- A. B. Owen. 2007. Infinitely imbalanced logistic regression. The Journal of Machine Learning Research 8 (2007), 761--773. Google ScholarDigital Library
- Moira Regelson and Daniel C. Fain. 2006. Predicting click-through rate using keyword clusters. In Proceedings of the Second Workshop on Sponsored Search Auctions.Google Scholar
- M. Richardson, E. Dominowska, and R. Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web. New York, NY, 521--530. Google ScholarDigital Library
- R. Rosales and O. Chapelle. 2011. Attribute selection by measuring information on reference distributions. In Tech Pulse Conference, Yahoo!. Retrieved from http://people.csail.mit.edu/romer/papers/RosChaTP11.pdf.Google Scholar
- R. Rosales, H. Cheng, and E. Manavoglu. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 293--302. Google ScholarDigital Library
- J. Sarkar. 1991. One-armed bandit problems with covariates. The Annals of Statistics (1991), 1978--2002.Google ScholarCross Ref
- B. Schölkopf and A. J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. Google ScholarDigital Library
- Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, and SVN Vishwanathan. 2009. Hash kernels for structured data. The Journal of Machine Learning Research 10 (2009), 2615--2637. Google ScholarDigital Library
- C. Teo, Q. Le, A. Smola, and SVN Vishwanathan. 2007. A scalable modular convex solver for regularized risk minimization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 727--736. Google ScholarDigital Library
- William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.Google ScholarCross Ref
- K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 1113--1120. Google ScholarDigital Library
- Jerry Ye, Jyh-Herng Chow, Jiang Chen, and Zhaohui Zheng. 2009. Stochastic gradient boosted distributed decision trees. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. 2061--2064. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15--28. Google ScholarDigital Library
Index Terms
- Simple and Scalable Response Prediction for Display Advertising
Recommendations
Multimedia features for click prediction of new ads in display advertising
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningNon-guaranteed display advertising (NGD) is a multi-billion dollar business that has been growing rapidly in recent years. Advertisers in NGD sell a large portion of their ad campaigns using performance dependent pricing models such as cost-per-click (...
A Practical Framework of Conversion Rate Prediction for Online Display Advertising
ADKDD'17: Proceedings of the ADKDD'17Cost-per-action (CPA), or cost-per-acquisition, has become the primary campaign performance objective in online advertising industry. As a result, accurate conversion rate (CVR) prediction is crucial for any real-time bidding (RTB) platform. However, ...
Introduction to display advertising: a half-day tutorial
WSDM '11: Proceedings of the fourth ACM international conference on Web search and data miningDisplay advertising is one of the two major advertising channels on the web (in addition to search advertising). Display advertising on the Web is usually done by graphical ads placed on the publishers' Web pages. There is no explicit user query, and ...
Comments