research-article

Active learning for ranking through expected loss optimization

Authors:

Olivier Chapelle,

Belle TsengAuthors Info & Claims

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Pages 267 - 274

https://doi.org/10.1145/1835449.1835495

Published: 19 July 2010 Publication History

Abstract

Learning to rank arises in many information retrieval applications, ranging from Web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples in the training set; on the other hand, obtaining labeled examples for training data is very expensive and time-consuming. This presents a great need for the active learning approaches to select most informative examples for ranking learning; however, in the literature there is still very limited work to address active learning for ranking. In this paper, we propose a general active learning framework, Expected Loss Optimization (ELO), for ranking. The ELO framework is applicable to a wide range of ranking functions. Under this framework, we derive a novel algorithm, Expected DCG Loss Optimization (ELO-DCG), to select most informative examples. Furthermore, we investigate both query and document level active learning for raking and propose a two-stage ELO-DCG algorithm which incorporate both query and document selection into active learning. Extensive experiments on real-world Web search data sets have demonstrated great potential and effective-ness of the proposed framework and algorithms.

References

[1]

N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., 1998.

Digital Library

[2]

J. A. Aslam, E. Kanoulas, V. Pavlu, S. Savev, and E. Yilmaz. Document selection methodologies for efficient and effective learning-to-rank. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 468--475. ACM, 2009.

Digital Library

[3]

J. Berger. Statistical decision theory and Bayesian analysis. Springer, 1985.

[4]

C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classifiers. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 111--118. Morgan Kaufmann, 2000.

Digital Library

[5]

B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006.

Digital Library

[6]

W. Chu and Z. Ghahramani. Extensions of gaussian processes for ranking: semi-supervised and active learning. In Nips workshop on Learning to Rank, 2005.

[7]

D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 705--712. The MIT Press, 1995.

[8]

D. Cossock and T. Zhang. Subset ranking using regression. In Proc. Conf. on Learning Theory, 2006.

Digital Library

[9]

I. Dagan and S. P. Engelson. Committee-based sampling for training probabilistic classifiers. In In Proceedings of the Twelfth International Conference on Machine Learning, pages 150--157. Morgan Kaufmann, 1995.

Digital Library

[10]

P. Donmez and J. Carbonell. Active sampling for rank learning via optimizing the area under the ROC curve. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, pages 78--89, 2009.

Digital Library

[11]

P. Donmez and J. G. Carbonell. Optimizing estimated loss reduction for active sampling in rank learning. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 248--255, New York, NY, USA, 2008. ACM.

Digital Library

[12]

V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972.

[13]

Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933--969, 2003.

Digital Library

[14]

Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133--168, 1997.

Digital Library

[15]

J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001.

[16]

T. Fushiki. Bootstrap prediction and bayesian prediction under misspecified models. Bernoulli, 11(4):747--758, 2005.

[17]

R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Smola, Bartlett, Schoelkopf, and Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, Cambridge, MA, 2000.

[18]

D. Lewis and W. Gale. Training text classifiers by uncertainty sampling. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3--12, 1994.

Digital Library

[19]

A. McCallum and K. Nigam. Employing EM and pool-based active learning for text classification. In Proceedings of the 5th international conference on Machine learning, pages 359--367, 1998.

Digital Library

[20]

B. Settles. Active learning literature survey. Technical Report Computer Sciences 1648, University of Wisconsin, Madison, 2009.

[21]

F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to rank: theory and algorithm. In ICML '08: Proceedings of the 25th international conference on Machine learning, pages 1192--1199, New York, NY, USA, 2008. ACM.

Digital Library

[22]

L. Yang, L. Wang, B. Geng, and X.-S. Hua. Query sampling for ranking learning in web search. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 754--755, New York, NY, USA, 2009. ACM.

Digital Library

[23]

E. Yilmaz and S. Robertson. Deep versus shallow judgments in learning to rank. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 662--663, New York, NY, USA, 2009. ACM.

Digital Library

[24]

H. Yu. SVM selective sampling for ranking with application to data retrieval. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 354--363. ACM, 2005.

Digital Library

[25]

Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A general boosting method and its application to learning ranking functions for web search. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1697--1704, 2008.

[26]

O. Zoeter, N. Craswell, M. Taylor, J. Guiver, and E. Snelson. A decision theoretic framework for implicit relevance feedback. In NIPS Workshop on Machine learning for web search, 2007.

Cited By

Zhang LZhong LYang FTang LDong DHui HTian J(2024)TripleSurv: Triplet Time-Adaptive Coordinate Learning Approach for Survival AnalysisIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345091036:12(9464-9475)Online publication date: Dec-2024
https://doi.org/10.1109/TKDE.2024.3450910
Liu ZZhang JHe X(2024)A Discrimination-Guided Active Learning Method Based on Marginal Representations for Industrial Compound Fault DiagnosisIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.332527121:4(6411-6422)Online publication date: Oct-2024
https://doi.org/10.1109/TASE.2023.3325271
Wang QLi HXiong HWang WBian JLu YWang SCheng ZDou DYin D(2024)A Simple yet Effective Framework for Active Learning to RankMachine Intelligence Research10.1007/s11633-023-1422-z21:1(169-183)Online publication date: 15-Jan-2024
https://doi.org/10.1007/s11633-023-1422-z
Show More Cited By

Index Terms

Active learning for ranking through expected loss optimization
1. Information systems

Recommendations

Active Learning for Ranking through Expected Loss Optimization
Learning to rank arises in many data mining applications, ranging from web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples in the ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Combining active learning and semi-supervised for improving learning performance
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

In many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

July 2010

944 pages

ISBN:9781450301534

DOI:10.1145/1835449

General Chairs:
Fabio Crestani
University of Lugano, CH
,
Stéphane Marchand-Maillet
University of Geneva, CH
,
Program Chairs:
Hsin-Hsi Chen
National Taiwan University, TW
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jacques Savoy
University of Neuchatel, CH

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '10

Sponsor:

SIGIR

SIGIR '10: The 33rd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2010

Geneva, Switzerland

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

49
Total Citations
View Citations
837
Total Downloads

Downloads (Last 12 months)37
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LZhong LYang FTang LDong DHui HTian J(2024)TripleSurv: Triplet Time-Adaptive Coordinate Learning Approach for Survival AnalysisIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345091036:12(9464-9475)Online publication date: Dec-2024
Liu ZZhang JHe X(2024)A Discrimination-Guided Active Learning Method Based on Marginal Representations for Industrial Compound Fault DiagnosisIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.332527121:4(6411-6422)Online publication date: Oct-2024
Wang QLi HXiong HWang WBian JLu YWang SCheng ZDou DYin D(2024)A Simple yet Effective Framework for Active Learning to RankMachine Intelligence Research10.1007/s11633-023-1422-z21:1(169-183)Online publication date: 15-Jan-2024
Rychener YKuhn DSutter TKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)End-to-end learning for stochastic optimizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619633(29455-29472)Online publication date: 23-Jul-2023
Li H(2022)Learning to Rank for Information Retrieval and Natural Language ProcessingundefinedOnline publication date: 2-Apr-2022
Das SWong WDietterich TFern AEmmott A(2020)Discovering Anomalies by Incorporating Feedback from an ExpertACM Transactions on Knowledge Discovery from Data10.1145/339660814:4(1-32)Online publication date: 22-Jun-2020
Min FLiu FWen LZhang Z(2019)Tri-partition cost-sensitive active learning through kNNSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-017-2879-x23:5(1557-1572)Online publication date: 1-Mar-2019
Wang HChang XShi LYang YShen Y(2018)Uncertainty sampling for action recognition via maximizing expected average precisionProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304415.3304552(964-970)Online publication date: 13-Jul-2018
Lucchese CNardini FPerego ROrlando STrani SCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Selective Gradient Boosting for Effective Learning to RankThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210048(155-164)Online publication date: 27-Jun-2018
Abdulrahman SBrazdil PRijn JVanschoren J(2018)Speeding up algorithm selection using average ranking and active testing by introducing runtimeMachine Language10.1007/s10994-017-5687-8107:1(79-108)Online publication date: 1-Jan-2018
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten