skip to main content
research-article

Simple and Scalable Response Prediction for Display Advertising

Published:29 December 2014Publication History
Skip Abstract Section

Abstract

Clickthrough and conversation rates estimation are two core predictions tasks in display advertising. We present in this article a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: It is easy to implement and deploy, it is highly scalable (we have trained it on terabytes of data), and it provides models with state-of-the-art accuracy.

References

  1. A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. 2011. A reliable effective terascale linear learning system. CoRR abs/1110.4198 (2011).Google ScholarGoogle Scholar
  2. D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. 2010. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Azin Ashkan, Charles L. A. Clarke, Eugene Agichtein, and Qi Guo. 2009. Estimating ad clickthrough rate through query intent analysis. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 2 (2002), 235--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. 2011. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1 (2011), 1--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (1970), 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Canini, T. Chandra, E. Ie, J. McFadden, K. Goldman, M. Gunter, J. Harmsen, K. LeFevre, D. Lepikhin, T. L. Llinares, I. Mukherjee, F. Pereira, J. Redstone, T. Shaked, and Y. Singer. 2012. Sibyl: A system for large scale supervised machine learning. (2012). Presentation at MLSS Santa Cruz, http://users.soe.ucsc.edu/niejiazhong/slides/chandra.pdf.Google ScholarGoogle Scholar
  9. D. Chakrabarti, D. Agarwal, and V. Josifovski. 2008. Contextual advertising by combining relevance with click feedback. In Proceedings of the 17th International Conference on World Wide Web. 417--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. The Journal of Machine Learning Research 11 (2010), 1471--1490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. O. Chapelle and L. Li. 2011. An empirical evaluation of thompson sampling. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira, and K. Q. Weinberger (Eds.). 2249--2257.Google ScholarGoogle Scholar
  12. S. F. Chen and J. Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language 13, 4 (1999), 359--393.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Haibin Cheng and Erick Cantú-Paz. 2010. Personalized click prediction in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 351--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Haibin Cheng, Roelof van Zwol, Javad Azimi, Eren Manavoglu, Ruofei Zhang, Yang Zhou, and Vidhya Navalpakkam. 2012. Multimedia features for click prediction of new ads in display advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 777--785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. 2007. Map-reduce for machine learning on multicore. In Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, Vol. 19.Google ScholarGoogle Scholar
  16. Massimiliano Ciaramita, Vanessa Murdock, and Vassilis Plachouras. 2008. Online learning from click data for sponsored search. In Proceedings of the 17th International Conference on World Wide Web. 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Cortes, Y. Mansour, and M. Mohri. 2010. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems, Vol. 23. 442--450.Google ScholarGoogle Scholar
  18. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. John Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2010), 2121--2159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi-task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 109--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Gelman and J. Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.Google ScholarGoogle Scholar
  22. John C. Gittins. 1989. Multi-armed Bandit Allocation Indices. John Wiley & Sons.Google ScholarGoogle Scholar
  23. T. Graepel, J. Quinonero Candela, T. Borchert, and R. Herbrich. 2010. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine. In Proceedings of the 27th International Conference on Machine Learning. 13--20.Google ScholarGoogle Scholar
  24. I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3 (2003), 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Hillard, E. Manavoglu, H. Raghavan, C. Leggetter, E. Cantú-Paz, and R. Iyer. 2011. The sum of its parts: Reducing sparsity in click estimation with query segments. Information Retrieval (2011), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. 2010. Improving ad relevance in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 361--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michael Kearns. 1993. Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computing. 392--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. King and L. Zeng. 2001. Logistic regression in rare events data. Political Analysis 9, 2 (2001), 137--163.Google ScholarGoogle ScholarCross RefCross Ref
  29. H. Koepke and M. Bilenko. 2012. Fast prediction of new feature utility. In Proceedings of the 29th International Conference on Machine Learning. 791--798.Google ScholarGoogle Scholar
  30. Nagaraj Kota and Deepak Agarwal. 2011. Temporal multi-hierarchy smoothing for estimating rates of rare events. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1361--1369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. L. Lai and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6 (1985), 4--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Langford, L. Li, and A. Strehl. 2007. Vowpal Wabbit Open Source Project. https://github.com/JohnLangford/vowpal_wabbit/wiki. (2007).Google ScholarGoogle Scholar
  33. L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web. 661--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Li, W. Chu, J. Langford, and X. Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 297--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yandong Liu, Sandeep Pandey, Deepak Agarwal, and Vanja Josifovski. 2012. Finding the right consumer: Optimizing for conversion in display advertising campaigns. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 473--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new framework for parallel machine learning. In The 26th Conference on Uncertainty in Artificial Intelligence.Google ScholarGoogle Scholar
  37. James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281--297.Google ScholarGoogle Scholar
  38. R. P. McAfee. 2011. The design of advertising exchanges. Review of Industrial Organization (2011), 1--17.Google ScholarGoogle Scholar
  39. H. B. McMahan and M. Streeter. 2010. Adaptive bound optimization for online convex optimization. In Proceedings of the 23rd Annual Conference on Learning Theory. 244--256.Google ScholarGoogle Scholar
  40. C. Meek, D. M. Chickering, and D. Wilson. 2005. Stochastic and contingent payment auctions. In Workshop on Sponsored Search Auctions, ACM Electronic Commerce.Google ScholarGoogle Scholar
  41. Lukas Meier, Sara Van De Geer, and Peter Bühlmann. 2008. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 1 (2008), 53--71.Google ScholarGoogle ScholarCross RefCross Ref
  42. S. Menard. 2001. Applied Logistic Regression Analysis. Vol. 106. Sage Publications, Inc.Google ScholarGoogle Scholar
  43. Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota. 2011. Response prediction using collaborative filtering with hierarchies and side-information. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 141--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. P. Minka. 2003. A Comparison of Numerical Optimizers for Logistic Regression. Technical Report. Microsoft Research. Retrieved from http://research.microsoft.com/en-us/um/people/minka/papers/logreg/.Google ScholarGoogle Scholar
  45. S. Muthukrishnan. 2009. Ad exchanges: Research issues. In Proceedings of the 5th International Workshop on Internet and Network Economics. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. K. Nigam, J. Lafferty, and A. McCallum. 1999. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, Vol. 1. 61--67.Google ScholarGoogle Scholar
  47. J. Nocedal. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation 35, 151 (1980), 773--782.Google ScholarGoogle ScholarCross RefCross Ref
  48. A. B. Owen. 2007. Infinitely imbalanced logistic regression. The Journal of Machine Learning Research 8 (2007), 761--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Moira Regelson and Daniel C. Fain. 2006. Predicting click-through rate using keyword clusters. In Proceedings of the Second Workshop on Sponsored Search Auctions.Google ScholarGoogle Scholar
  50. M. Richardson, E. Dominowska, and R. Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web. New York, NY, 521--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. R. Rosales and O. Chapelle. 2011. Attribute selection by measuring information on reference distributions. In Tech Pulse Conference, Yahoo!. Retrieved from http://people.csail.mit.edu/romer/papers/RosChaTP11.pdf.Google ScholarGoogle Scholar
  52. R. Rosales, H. Cheng, and E. Manavoglu. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 293--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Sarkar. 1991. One-armed bandit problems with covariates. The Annals of Statistics (1991), 1978--2002.Google ScholarGoogle ScholarCross RefCross Ref
  54. B. Schölkopf and A. J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, and SVN Vishwanathan. 2009. Hash kernels for structured data. The Journal of Machine Learning Research 10 (2009), 2615--2637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. C. Teo, Q. Le, A. Smola, and SVN Vishwanathan. 2007. A scalable modular convex solver for regularized risk minimization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 727--736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.Google ScholarGoogle ScholarCross RefCross Ref
  58. K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 1113--1120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jerry Ye, Jyh-Herng Chow, Jiang Chen, and Zhaohui Zheng. 2009. Stochastic gradient boosted distributed decision trees. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. 2061--2064. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Simple and Scalable Response Prediction for Display Advertising

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 5, Issue 4
          Special Sections on Diversity and Discovery in Recommender Systems, Online Advertising and Regular Papers
          January 2015
          390 pages
          ISSN:2157-6904
          EISSN:2157-6912
          DOI:10.1145/2699158
          • Editor:
          • Huan Liu
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 December 2014
          • Accepted: 1 August 2013
          • Revised: 1 April 2013
          • Received: 1 December 2012
          Published in tist Volume 5, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader