skip to main content
10.1145/2556195.2556242acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Sampling dilemma: towards effective data sampling for click prediction in sponsored search

Published:24 February 2014Publication History

ABSTRACT

Precise prediction of the probability that users click on ads plays a key role in sponsored search. State-of-the-art sponsored search systems typically employ a machine learning approach to conduct click prediction. While paying much attention to extracting useful features and building effective models, previous studies have overshadowed seemingly less obvious but essentially important challenges in terms of data sampling. To fulfill the learning objective of click prediction, it is not only necessary to ensure that the sampled training data implies the similar input distribution compared with the real world one, but also to guarantee that the sampled training data yield the consistent conditional output distribution, i.e. click-through rate (CTR), with the real world data. However, due to the sparseness of clicks in sponsored search, it is a bit contradictory to address these two challenges simultaneously. In this paper, we first take a theoretical analysis to reveal this sampling dilemma, followed by a thorough data analysis which demonstrates that the straightforward random sampling method may not be effective to balance these two kinds of consistency in sampling dilemma simultaneously. To address this problem, we propose a new sampling algorithm which can succeed in retaining the consistency between the sampled data and real world in terms of both input distribution and conditional output distribution. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that this new sampling algorithm can effectively address the sampling dilemma. Further experiments illustrate that, by using the training data obtained by our new sampling algorithm, we can learn the model with much higher accuracy in click prediction.

References

  1. V. Abhishek and K. Hosanagar. Keyword generation for search engine advertising using semantic similarity between terms. In Proc. of EC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Attenberg, S. Pandey, and T. Suel. Modeling and predicting user behavior in sponsored search. In Proc. of KDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Berger and V. Pietra. A maximum entropy approach to natural language processing. In Computational Linguistics, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Cheng and E. Cantu-Paz. Personalized click prediction in sponsored search. In Proc. of WSDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Clarke, E. Agichtein, S. Dumais, and R. White. The influence of caption features oh clickthrough patterns in web search. In Proc. of SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Dembczynski, W. Kotlowski, and D. Weiss. Predicting ads click-through rate with decision rules. In Workshop on Targeting and Ranking in Online Advertising, 2008.Google ScholarGoogle Scholar
  7. B. Edelman, M. Ostrovsky, and M. Schwarz. Internet adverstising and the generalized second-price auction: selling billions of dollars worth of keywords. In The American Economic Review, 2007.Google ScholarGoogle Scholar
  8. D. Fain and J. Pedersen. Sponsored search: a brief history. In Proc. of 2nd Workshop on Sponsored Search Auctions, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. Graepel, J. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. In Proc. of ICML, 2010.Google ScholarGoogle Scholar
  10. B. Jansen and T. Mullen. Sponsored search: an overview of the concept, history, and technology. In International Journal of Electric Business, 2008.Google ScholarGoogle Scholar
  11. T. P. Minka. A comparison of numerical optimizers for logistic regression. In Technical report, Microsoft, 2003.Google ScholarGoogle Scholar
  12. A. Mordecai. Nonlinear Programming: Analysis and Methods.Google ScholarGoogle Scholar
  13. F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. Optimizing relevance and revenue in ad search: a query substitution approach. In Proc. of SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Raghavan and R. Iyer. Evaluating vector-space and probabilistic models for query to ad matching. In Proc. of SIGIR Workshop on Information Retrieval for Advertising, 2008.Google ScholarGoogle Scholar
  15. M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. of WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Shaparenko, O. Cetin, and R. Iyer. Data-driven text features for sponsored search click prediction. In Proc. of ADKDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Xiong, T. Wang, W. Ding, Y. Shen, and T.-Y. Liu. Relational click prediction for sponsored search. In Proc. of WSDM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Xu, E. Manavoglu, and E. Cantu-Paz. Temporal click model for sponsored search. In Proc. of SIGIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proc. of SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sampling dilemma: towards effective data sampling for click prediction in sponsored search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
          February 2014
          712 pages
          ISBN:9781450323512
          DOI:10.1145/2556195

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 February 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader