|
ABSTRACT
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
Bernhard E. Boser , Isabelle M. Guyon , Vladimir N. Vapnik, A training algorithm for optimal margin classifiers, Proceedings of the fifth annual workshop on Computational learning theory, p.144-152, July 27-29, 1992, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/130385.130401]
|
| |
5
|
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, August 1996.
|
| |
6
|
W. Cohen, R. Shapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research, 10, 1999.
|
| |
7
|
|
| |
8
|
K. Crammer and Y. Singer. Pranking with ranking. In Advances in Neural Information Processing Systems (NIPS), 2001.
|
| |
9
|
|
 |
10
|
|
| |
11
|
N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, K. Tzeras, and G. Knorz. Air/x - a rule-based multistage indexing system for large subject fields. In RIAO, pages 606--623, 1991.
|
| |
12
|
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115--132. MIT Press, Cambridge, MA, 2000.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
T. Joachims. Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science, 2002. http://www.joachims.org.
|
| |
17
|
T. Joachims, D. Freitag, and T. Mitchell. WebWatcher: a tour guide for the world wide web. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), volume 1, pages 770--777. Morgan Kaufmann, 1997.
|
| |
18
|
J. Kemeny and L. Snell. Mathematical Models in the Social Sciences. Ginn & Co, 1962.
|
| |
19
|
M. Kendall. Rank Correlation Methods. Hafner, 1955.
|
| |
20
|
H. Lieberman. Letizia: An agent that assists Web browsing. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI '95), Montreal, Canada, 1995. Morgan Kaufmann.
|
| |
21
|
A. Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3 edition, 1974.
|
| |
22
|
L. Page and S. Brin. Pagerank, an eigenvector based ranking approach for hypertext. In 2lst Annual ACM/SIGIR International Conference on Research and Development in Information Retrieval, 1998.
|
| |
23
|
|
| |
24
|
C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC 1998-014, Digital Systems Research Center, 1998.
|
| |
25
|
V. Vapnik. Statistical Learning Theory. Wiley, Chichester, GB, 1998.
|
| |
26
|
|
CITED BY 122
|
|
|
|
|
|
|
|
|
|
|
|
|
Shane Ahern , Simon King , Mor Naaman , Rahul Nair, Summarization of online image collections via implicit feedback, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nick Craswell , Onno Zoeter , Michael Taylor , Bill Ramsey, An experimental comparison of click position-bias models, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
Qiankun Zhao , Tie-Yan Liu , Sourav S. Bhowmick , Wei-Ying Ma, Event detection from evolution of click-through data, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
Hiranmay Ghosh , P. Poornachander , Anupama Mallik , Santanu Chaudhury, Learning ontology for personalized video retrieval, Workshop on multimedia information retrieval on The many faces of multimedia semantics, September 28-28, 2007, Augsburg, Bavaria, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Qiankun Zhao , Steven C. H. Hoi , Tie-Yan Liu , Sourav S. Bhowmick , Michael R. Lyu , Wei-Ying Ma, Time-dependent semantic similarity measure of queries using historical click-through data, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
|
|
Eugene Agichtein , Eric Brill , Susan Dumais , Robert Ragno, Learning user interaction models for predicting web search result preferences, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
Dong Xin , Xuehua Shen , Qiaozhu Mei , Jiawei Han, Discovering interesting patterns through user's interactive feedback, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
Melinda T. Gervasio , Michael D. Moffitt , Martha E. Pollack , Joseph M. Taylor , Tomas E. Uribe, Active preference learning for personalized calendar scheduling assistance, Proceedings of the 10th international conference on Intelligent user interfaces, January 10-13, 2005, San Diego, California, USA
|
|
|
|
|
|
Lei Zhang , Le Chen , Feng Jing , Kefeng Deng , Wei-Ying Ma, EnjoyPhoto: a vertical image search engine for enjoying high-quality photos, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
Sandeep Pandey , Sourashis Roy , Christopher Olston , Junghoo Cho , Soumen Chakrabarti, Shuffling a stacked deck: the case for partially randomized ranking of search engine results, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
Gui-Rong Xue , Hua-Jun Zeng , Zheng Chen , Yong Yu , Wei-Ying Ma , WenSi Xi , WeiGuo Fan, Optimizing web search using web click-through data, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yunbo Cao , Jun Xu , Tie-Yan Liu , Hang Li , Yalou Huang , Hsiao-Wuen Hon, Adapting ranking SVM to document retrieval, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
Tao Qin , Xu-Dong Zhang , Tie-Yan Liu , De-Sheng Wang , Wei-Ying Ma , Hong-Jiang Zhang, An active feedback framework for image retrieval, Pattern Recognition Letters, v.29 n.5, p.637-646, April, 2008
|
|
|
|
|
Michael Taylor , John Guiver , Stephen Robertson , Tom Minka, SoftRank: optimizing non-smooth rank metrics, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
Lori Lorigo , Bing Pan , Helene Hembrooke , Thorsten Joachims , Laura Granka , Geri Gay, The influence of task and gender on search and evaluation behavior using Google, Information Processing and Management: an International Journal, v.42 n.4, p.1123-1131, July 2006
|
|
|
|
|
|
|
Ming-Feng Tsai , Tie-Yan Liu , Tao Qin , Hsin-Hsi Chen , Wei-Ying Ma, FRank: a ranking method with fidelity loss, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
Antonio Bahamonde , Gustavo F. Bayón , Jorge Díez , José Ramón Quevedo , Oscar Luaces , Juan José del Coz , Jaime Alonso , Félix Goyache, Feature subset selection for learning preferences: a case study, Proceedings of the twenty-first international conference on Machine learning, p.7, July 04-08, 2004, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
Qingqing Gan , Josh Attenberg , Alexander Markowetz , Torsten Suel, Analysis of geographic queries in a search engine log, Proceedings of the first international workshop on Location and the web, p.49-56, April 22-22, 2008, Beijing, China
|
|
|
|
|
|
Amir Hosein Keyhanipour , Behzad Moshiri , Majid Kazemian , Maryam Piroozmand , Caro Lucas, Aggregation of web search engines based on users' preferences in WebFusion, Knowledge-Based Systems, v.20 n.4, p.321-328, May, 2007
|
|
|
|
|
Zhe Cao , Tao Qin , Tie-Yan Liu , Ming-Feng Tsai , Hang Li, Learning to rank: from pairwise approach to listwise approach, Proceedings of the 24th international conference on Machine learning, p.129-136, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
|
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
Jian-Tao Sun , Hua-Jun Zeng , Huan Liu , Yuchang Lu , Zheng Chen, CubeSVD: a novel approach to personalized Web search, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Gao , Cheng Niu , Jian-Yun Nie , Ming Zhou , Jian Hu , Kam-Fai Wong , Hsiao-Wuen Hon, Cross-lingual query suggestion using query logs of different languages, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
Tao Qin , Xu-Dong Zhang , Ming-Feng Tsai , De-Sheng Wang , Tie-Yan Liu , Hang Li, Query-level loss functions for information retrieval, Information Processing and Management: an International Journal, v.44 n.2, p.838-855, March, 2008
|
|
|
|
|
|
|
|
|
|
|
Le Chen , Lei Zhang , Feng Jing , Ke-Feng Deng , Wei-Ying Ma, Ranking web objects from multiple communities, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , De-Sheng Wang , Wen-Ying Xiong , Hang Li, Learning to rank relational objects and its application to web search, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hongyuan Zha , Zhaohui Zheng , Haoying Fu , Gordon Sun, Incorporating query difference for learning retrieval functions in world wide web search, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Lida Li , Michael J. Muller , Werner Geyer , Casey Dugan , Beth Brownholtz , David R. Millen, Predicting individual priorities of shared activities using support vector machines, Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, November 06-10, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
Tao Qin , Xu-Dong Zhang , De-Sheng Wang , Tie-Yan Liu , Wei Lai , Hang Li, Ranking with multiple hyperplanes, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Filip Radlinski , Geri Gay, Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search, ACM Transactions on Information Systems (TOIS), v.25 n.2, p.7-es, April 2007
|
|
|
|
|
Shenghua Bao , Guirong Xue , Xiaoyuan Wu , Yong Yu , Ben Fei , Zhong Su, Optimizing web search using social annotations, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|