research-article

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

Authors:
Pavel Metrikov

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Virgil Pavlu

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Javed A. Aslam

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015Pages 1391–1400https://doi.org/10.1145/2806416.2806492

Published:17 October 2015Publication History

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1391–1400

ABSTRACT

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the inference of true document relevance from crowdsourced data---one simpler than previous approaches and achieving better performance. For each assessor, we model assessor quality and bias in the form of Gaussian distributed class conditionals of relevance grades. For each document, we model true relevance and difficulty as continuous variables. We estimate all parameters from crowdsourced data, demonstrating better inference of relevance as well as realistic models for both documents and assessors.

A document-pair likelihood model works best, and it is extended to pairwise learning to rank. Utilizing more information directly from the input data, it shows better performance as compared to existing state-of-the-art approaches for learning to rank from crowdsourced assessments. Experimental validation is performed on four TREC datasets.

References

D. Andrich. A rating formulation for ordered response categories. Psychometrika, 43:561--573, 1978.Google ScholarCross Ref
Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006. Google ScholarDigital Library
C.J.C. Burges. From ranknet to lambdarank to lambdamart: An overview, 2010.Google Scholar
M. Lease C. Buckley and M. D. Smucker. Overview of the TREC 2010 Relevance Feedback Track (Notebook). In TREC, 2010.Google Scholar
O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. JMLR, 14:1--24, 2011.Google Scholar
X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarDigital Library
A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 28(1):20--28, 1979.Google ScholarCross Ref
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 2000.Google Scholar
Y. Ganjisaffar, R. Caruana, and C. V. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. SIGIR, pages 85--94, 2011. Google ScholarDigital Library
M. Hosseini, I. J. Cox, N. Milic-Frayling, G. Kazai, and V. Vinay. On aggregating labels from multiple crowd workers to infer relevance of documents. In ECIR, 2012. Google ScholarDigital Library
V. E. Johnson. On bayesian analysis of multirater ordinal data: An application to automated essay grading. Journal of the American Statistical Association, 91(433):42--51, 1996.Google ScholarCross Ref
Chao L. and Y.-M. Wang. Truelabel confusions: A spectrum of probabilistic models in analyzing multiple ratings. In ICML, pages 225--232, 2012.Google Scholar
B. Lakshminarayanan and Y. W. Teh. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv:1305.0015, 2013.Google Scholar
Q. Liu, J. Peng, and A. T Ihler. Variational inference for crowdsourcing. In NIPS, pages 692--700. 2012.Google ScholarDigital Library
G. N. Masters. A rasch model for partial credit scoring. Psychometrika, 47:149--174, 1982.Google ScholarCross Ref
P. Metrikov, J. Wu, J. Anderton, V. Pavlu, and J. A. Aslam. A modification of lambdamart to handle noisy crowdsourced assessments. In ICTIR, 2013. Google ScholarDigital Library
Paul Mineiro. Ordered values and mechanical turk. http://www.machinedlearnings.com, 2011.Google Scholar
S. Niu, Y. Lan, J. Guo, X. Cheng, L. Yu, and G. Long. Listwise approach for rank aggregation in crowdsourcing. In WSDM, pages 253--262, 2015. Google ScholarDigital Library
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3 edition, 2007. Google ScholarDigital Library
V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. J. Mach. Learn. Res., 11:1297--1322, 2010. Google ScholarDigital Library
S. Rogers, M. Girolami, and T. Polajnar. Semi-parametric analysis of multi-rater data. Statistics and Computing, 20(3):317--334, 2010. Google ScholarDigital Library
V. Sheng, F. Provost, and P. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008. Google ScholarDigital Library
M. Smucker, G. Kazai, and M. Lease. Overview of the TREC 2013 Crowdsourcing Track. In TREC, 2014.Google Scholar
M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, 2007. Google ScholarDigital Library
J. S. Uebersax and W. M. Grove. A latent trait finite mixture model for the analysis of rating agreement. In Biometrics, December 1993.Google Scholar
M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In WWW, 2014. Google ScholarDigital Library
Maksims N. Volkovs and Richard S. Zemel. New learning methods for supervised and unsupervised preference aggregation. JMLR, 15:1135--1176, 2014. Google ScholarDigital Library
T. P. Waterhouse. Pay by the bit: An information-theoretic metric for collective human judgment. In CSCW, pages 623--638, 2013. Google ScholarDigital Library
J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009.Google ScholarDigital Library
D. Zhou, Q. Liu, J. C. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. In ICML, 2014.Google ScholarDigital Library
D. Zhou, J. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In NIPS, 2012.Google ScholarDigital Library

Index Terms

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model
1. Information systems
  1. Information retrieval

Recommendations

Learning to rank with groups
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

An essential issue in document retrieval is ranking, and the documents are ranked by their expected relevance to a given query. Multiple labels are used to represent different level of relevance for documents to a given query, and the corresponding ...
Read More
Compression-Based Selective Sampling for Learning to Rank
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or ...
Read More
Label Aggregation with Clustering for Biased Crowdsourced Labeling
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and Computing

With the rapid development of crowdsourcing learning, amount of label aggregation methods are proposed to infer the true labels of instances from multiple noisy labels provided by inexpert crowd workers. Most of the label aggregation methods take the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
assessor modeling
crowdsourcing
informativeness
learning to rank
ordinal label aggregation
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 277
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning to rank with groups

Compression-Based Selective Sampling for Learning to Rank

Label Aggregation with Clustering for Biased Crowdsourced Labeling