research-article

Sampling dilemma: towards effective data sampling for click prediction in sponsored search

Authors:
Jun Feng

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jiang Bian

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

,
Taifeng Wang

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

,
Wei Chen

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

,
Xiaoyan Zhu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Tie-Yan Liu

Microsoft Research, Beijing, China

Microsoft Research, Beijing, China
View Profile

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014Pages 103–112https://doi.org/10.1145/2556195.2556242

Published:24 February 2014Publication History

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 103–112

ABSTRACT

Precise prediction of the probability that users click on ads plays a key role in sponsored search. State-of-the-art sponsored search systems typically employ a machine learning approach to conduct click prediction. While paying much attention to extracting useful features and building effective models, previous studies have overshadowed seemingly less obvious but essentially important challenges in terms of data sampling. To fulfill the learning objective of click prediction, it is not only necessary to ensure that the sampled training data implies the similar input distribution compared with the real world one, but also to guarantee that the sampled training data yield the consistent conditional output distribution, i.e. click-through rate (CTR), with the real world data. However, due to the sparseness of clicks in sponsored search, it is a bit contradictory to address these two challenges simultaneously. In this paper, we first take a theoretical analysis to reveal this sampling dilemma, followed by a thorough data analysis which demonstrates that the straightforward random sampling method may not be effective to balance these two kinds of consistency in sampling dilemma simultaneously. To address this problem, we propose a new sampling algorithm which can succeed in retaining the consistency between the sampled data and real world in terms of both input distribution and conditional output distribution. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that this new sampling algorithm can effectively address the sampling dilemma. Further experiments illustrate that, by using the training data obtained by our new sampling algorithm, we can learn the model with much higher accuracy in click prediction.

References

V. Abhishek and K. Hosanagar. Keyword generation for search engine advertising using semantic similarity between terms. In Proc. of EC, 2007. Google ScholarDigital Library
J. Attenberg, S. Pandey, and T. Suel. Modeling and predicting user behavior in sponsored search. In Proc. of KDD, 2009. Google ScholarDigital Library
A. Berger and V. Pietra. A maximum entropy approach to natural language processing. In Computational Linguistics, 1996. Google ScholarDigital Library
H. Cheng and E. Cantu-Paz. Personalized click prediction in sponsored search. In Proc. of WSDM, 2010. Google ScholarDigital Library
C. Clarke, E. Agichtein, S. Dumais, and R. White. The influence of caption features oh clickthrough patterns in web search. In Proc. of SIGIR, 2007. Google ScholarDigital Library
K. Dembczynski, W. Kotlowski, and D. Weiss. Predicting ads click-through rate with decision rules. In Workshop on Targeting and Ranking in Online Advertising, 2008.Google Scholar
B. Edelman, M. Ostrovsky, and M. Schwarz. Internet adverstising and the generalized second-price auction: selling billions of dollars worth of keywords. In The American Economic Review, 2007.Google Scholar
D. Fain and J. Pedersen. Sponsored search: a brief history. In Proc. of 2nd Workshop on Sponsored Search Auctions, 2006.Google ScholarCross Ref
T. Graepel, J. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. In Proc. of ICML, 2010.Google Scholar
B. Jansen and T. Mullen. Sponsored search: an overview of the concept, history, and technology. In International Journal of Electric Business, 2008.Google Scholar
T. P. Minka. A comparison of numerical optimizers for logistic regression. In Technical report, Microsoft, 2003.Google Scholar
A. Mordecai. Nonlinear Programming: Analysis and Methods.Google Scholar
F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. Optimizing relevance and revenue in ad search: a query substitution approach. In Proc. of SIGIR, 2008. Google ScholarDigital Library
H. Raghavan and R. Iyer. Evaluating vector-space and probabilistic models for query to ad matching. In Proc. of SIGIR Workshop on Information Retrieval for Advertising, 2008.Google Scholar
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. of WWW, 2007. Google ScholarDigital Library
B. Shaparenko, O. Cetin, and R. Iyer. Data-driven text features for sponsored search click prediction. In Proc. of ADKDD, 2009. Google ScholarDigital Library
C. Xiong, T. Wang, W. Ding, Y. Shen, and T.-Y. Liu. Relational click prediction for sponsored search. In Proc. of WSDM, 2012. Google ScholarDigital Library
W. Xu, E. Manavoglu, and E. Cantu-Paz. Temporal click model for sponsored search. In Proc. of SIGIR, 2010. Google ScholarDigital Library
W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proc. of SIGIR, 2007. Google ScholarDigital Library

Index Terms

Sampling dilemma: towards effective data sampling for click prediction in sponsored search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
  2. World Wide Web
    1. Web applications
    2. Web services

Recommendations

New features for query dependent sponsored search click prediction
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Click prediction for sponsored search is an important problem for commercial search engines. Good click prediction algorithm greatly affects on the revenue of the search engine, user experience and brings more clicks to landing pages of advertisers. ...
Read More
Psychological advertising: exploring user psychology for click prediction in sponsored search
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Precise click prediction is one of the key components in the sponsored search system. Previous studies usually took advantage of two major kinds of information for click prediction, i.e., relevance information representing the similarity between ads and ...
Read More
Personalized click prediction in sponsored search
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Sponsored search is a multi-billion dollar business that generates most of the revenue for search engines. Predicting the probability that users click on ads is crucial to sponsored search because the prediction is used to influence ranking, filtering, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
General Chairs:
Ben Carterette
University of Delaware, USA
,
Fernando Diaz
Microsoft Research, USA
,
Program Chairs:
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Donald Metzler
Google, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
click prediction
data sampling
online advertising
sponsored search
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 390
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sampling dilemma: towards effective data sampling for click prediction in sponsored search

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

New features for query dependent sponsored search click prediction

Psychological advertising: exploring user psychology for click prediction in sponsored search

Personalized click prediction in sponsored search