ACM Home Page
Please provide us with feedback. Feedback
Optimizing search engines using clickthrough data
Full text pdf formatPdf (954 KB)
Source Conference on Knowledge Discovery in Data archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
SESSION: Web search and navigation table of contents
Pages: 133 - 142  
Year of Publication: 2002
ISBN:1-58113-567-X
Author
Thorsten Joachims  Cornell University, Ithaca, NY
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 56,   Downloads (12 Months): 577,   Citation Count: 122
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775067
What is a DOI?

ABSTRACT

This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
4
 
5
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, August 1996.
 
6
W. Cohen, R. Shapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research, 10, 1999.
 
7
 
8
K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Information Processing Systems (NIPS), 2001.
 
9
10
 
11
N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, K. Tzeras, and G. Knorz. Air/x - a rule-based multistage indexing system for large subject fields. In RIAO, pages 606--623, 1991.
 
12
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115--132. MIT Press, Cambridge, MA, 2000.
 
13
 
14
 
15
 
16
T. Joachims. Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science, 2002. http://www.joachims.org.
 
17
T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: a tour guide for the world wide web. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), volume 1, pages 770--777. Morgan Kaufmann, 1997.
 
18
J. Kemeny and L. Snell. Mathematical Models in the Social Sciences. Ginn & Co, 1962.
 
19
M. Kendall. Rank Correlation Methods. Hafner, 1955.
 
20
H. Lieberman. Letizia: An agent that assists Web browsing. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI '95), Montreal, Canada, 1995. Morgan Kaufmann.
 
21
A. Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3 edition, 1974.
 
22
L. Page and S. Brin. Pagerank, an eigenvector based ranking approach for hypertext. In 2lst Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval, 1998.
 
23
 
24
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC 1998-014, Digital Systems Research Center, 1998.
 
25
V. Vapnik. Statistical Learning Theory. Wiley, Chichester, GB, 1998.
 
26

CITED BY  122
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


Peer to Peer - Readers of this Article have also read: