ABSTRACT
Precise prediction of the probability that users click on ads plays a key role in sponsored search. State-of-the-art sponsored search systems typically employ a machine learning approach to conduct click prediction. While paying much attention to extracting useful features and building effective models, previous studies have overshadowed seemingly less obvious but essentially important challenges in terms of data sampling. To fulfill the learning objective of click prediction, it is not only necessary to ensure that the sampled training data implies the similar input distribution compared with the real world one, but also to guarantee that the sampled training data yield the consistent conditional output distribution, i.e. click-through rate (CTR), with the real world data. However, due to the sparseness of clicks in sponsored search, it is a bit contradictory to address these two challenges simultaneously. In this paper, we first take a theoretical analysis to reveal this sampling dilemma, followed by a thorough data analysis which demonstrates that the straightforward random sampling method may not be effective to balance these two kinds of consistency in sampling dilemma simultaneously. To address this problem, we propose a new sampling algorithm which can succeed in retaining the consistency between the sampled data and real world in terms of both input distribution and conditional output distribution. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that this new sampling algorithm can effectively address the sampling dilemma. Further experiments illustrate that, by using the training data obtained by our new sampling algorithm, we can learn the model with much higher accuracy in click prediction.
- V. Abhishek and K. Hosanagar. Keyword generation for search engine advertising using semantic similarity between terms. In Proc. of EC, 2007. Google ScholarDigital Library
- J. Attenberg, S. Pandey, and T. Suel. Modeling and predicting user behavior in sponsored search. In Proc. of KDD, 2009. Google ScholarDigital Library
- A. Berger and V. Pietra. A maximum entropy approach to natural language processing. In Computational Linguistics, 1996. Google ScholarDigital Library
- H. Cheng and E. Cantu-Paz. Personalized click prediction in sponsored search. In Proc. of WSDM, 2010. Google ScholarDigital Library
- C. Clarke, E. Agichtein, S. Dumais, and R. White. The influence of caption features oh clickthrough patterns in web search. In Proc. of SIGIR, 2007. Google ScholarDigital Library
- K. Dembczynski, W. Kotlowski, and D. Weiss. Predicting ads click-through rate with decision rules. In Workshop on Targeting and Ranking in Online Advertising, 2008.Google Scholar
- B. Edelman, M. Ostrovsky, and M. Schwarz. Internet adverstising and the generalized second-price auction: selling billions of dollars worth of keywords. In The American Economic Review, 2007.Google Scholar
- D. Fain and J. Pedersen. Sponsored search: a brief history. In Proc. of 2nd Workshop on Sponsored Search Auctions, 2006.Google ScholarCross Ref
- T. Graepel, J. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. In Proc. of ICML, 2010.Google Scholar
- B. Jansen and T. Mullen. Sponsored search: an overview of the concept, history, and technology. In International Journal of Electric Business, 2008.Google Scholar
- T. P. Minka. A comparison of numerical optimizers for logistic regression. In Technical report, Microsoft, 2003.Google Scholar
- A. Mordecai. Nonlinear Programming: Analysis and Methods.Google Scholar
- F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. Optimizing relevance and revenue in ad search: a query substitution approach. In Proc. of SIGIR, 2008. Google ScholarDigital Library
- H. Raghavan and R. Iyer. Evaluating vector-space and probabilistic models for query to ad matching. In Proc. of SIGIR Workshop on Information Retrieval for Advertising, 2008.Google Scholar
- M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. of WWW, 2007. Google ScholarDigital Library
- B. Shaparenko, O. Cetin, and R. Iyer. Data-driven text features for sponsored search click prediction. In Proc. of ADKDD, 2009. Google ScholarDigital Library
- C. Xiong, T. Wang, W. Ding, Y. Shen, and T.-Y. Liu. Relational click prediction for sponsored search. In Proc. of WSDM, 2012. Google ScholarDigital Library
- W. Xu, E. Manavoglu, and E. Cantu-Paz. Temporal click model for sponsored search. In Proc. of SIGIR, 2010. Google ScholarDigital Library
- W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proc. of SIGIR, 2007. Google ScholarDigital Library
Index Terms
- Sampling dilemma: towards effective data sampling for click prediction in sponsored search
Recommendations
New features for query dependent sponsored search click prediction
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebClick prediction for sponsored search is an important problem for commercial search engines. Good click prediction algorithm greatly affects on the revenue of the search engine, user experience and brings more clicks to landing pages of advertisers. ...
Psychological advertising: exploring user psychology for click prediction in sponsored search
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningPrecise click prediction is one of the key components in the sponsored search system. Previous studies usually took advantage of two major kinds of information for click prediction, i.e., relevance information representing the similarity between ads and ...
Personalized click prediction in sponsored search
WSDM '10: Proceedings of the third ACM international conference on Web search and data miningSponsored search is a multi-billion dollar business that generates most of the revenue for search engines. Predicting the probability that users click on ads is crucial to sponsored search because the prediction is used to influence ranking, filtering, ...
Comments