|
ABSTRACT
Effective ranking functions are an essential part of commercial search engines. We focus on developing a regression framework for learning ranking functions for improving relevance of search engines serving diverse streams of user queries. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: 1) absolute relevance judgments arising from explicit labeling of search results; and 2) relative relevance judgments extracted from user click throughs of search results or converted from the absolute relevance judgments. We propose a novel optimization framework emphasizing the use of relative relevance judgments. The main contribution is the development of an algorithm based on regression that can be applied to objective functions involving preference data, i.e., data indicating that a document is more relevant than another with respect to a query. Experimental results are carried out using data sets obtained from a commercial search engine. Our results show significant improvements of our proposed methods over some existing methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
D. Bertsekas. Nonlinear programming Athena Scienti?c, second edition, 1999.
|
 |
4
|
Chris Burges , Tal Shaked , Erin Renshaw , Ari Lazier , Matt Deeds , Nicole Hamilton , Greg Hullender, Learning to rank using gradient descent, Proceedings of the 22nd international conference on Machine learning, p.89-96, August 07-11, 2005, Bonn, Germany
[doi> 10.1145/1102351.1102363]
|
| |
5
|
|
| |
6
|
W. Cooper, F. Gey and A. Chen. Probabilistic retrieval in the TIPSTER collections: an application of staged logistic regression. Proceedings of TREC 73--88, 1992.
|
| |
7
|
D. Cossock and T. Zhang. Subset ranking using regression. COLT 2006.
|
| |
8
|
|
| |
9
|
J. Friedman. Greedy function approximation: a gradient boosting machine. Ann. Statist. 29:1189--1232, 2001.
|
 |
10
|
|
| |
11
|
F. Gey, A. Chen, J. He and J. Meggs. Logistic regression at TREC4: probabilistic retrieval from full text document collections. Proceedings of TREC 65--72, 1995.
|
 |
12
|
|
 |
13
|
|
| |
14
|
T. Joachims. Evaluating retrieval performance using clickthrough data. Proceedings of the SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval 2002.
|
 |
15
|
Thorsten Joachims , Laura Granka , Bing Pan , Helene Hembrooke , Geri Gay, Accurately interpreting clickthrough data as implicit feedback, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076063]
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
Hongyuan Zha , Zhaohui Zheng , Haoying Fu , Gordon Sun, Incorporating query difference for learning retrieval functions in world wide web search, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183660]
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
Additional Classification:
H.
Information Systems
H.4
INFORMATION SYSTEMS APPLICATIONS
H.4.m
Miscellaneous
General Terms:
Algorithms,
Experimentation,
Theory
Keywords:
absolute relevance judgment,
clickthroughs,
functional gradient descent,
gradient boosting,
machine learning,
preferences,
ranking function,
regression,
relative relevance judgment
|