skip to main content
10.1145/2806416.2806492acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

Published:17 October 2015Publication History

ABSTRACT

Existing approaches used for training and evaluating search engines often rely on crowdsourced assessments of document relevance with respect to a user query. To use such assessments for either evaluation or learning, we propose a new framework for the inference of true document relevance from crowdsourced data---one simpler than previous approaches and achieving better performance. For each assessor, we model assessor quality and bias in the form of Gaussian distributed class conditionals of relevance grades. For each document, we model true relevance and difficulty as continuous variables. We estimate all parameters from crowdsourced data, demonstrating better inference of relevance as well as realistic models for both documents and assessors.

A document-pair likelihood model works best, and it is extended to pairwise learning to rank. Utilizing more information directly from the input data, it shows better performance as compared to existing state-of-the-art approaches for learning to rank from crowdsourced assessments. Experimental validation is performed on four TREC datasets.

References

  1. D. Andrich. A rating formulation for ordered response categories. Psychometrika, 43:561--573, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  2. Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.J.C. Burges. From ranknet to lambdarank to lambdamart: An overview, 2010.Google ScholarGoogle Scholar
  4. M. Lease C. Buckley and M. D. Smucker. Overview of the TREC 2010 Relevance Feedback Track (Notebook). In TREC, 2010.Google ScholarGoogle Scholar
  5. O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. JMLR, 14:1--24, 2011.Google ScholarGoogle Scholar
  6. X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 28(1):20--28, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 2000.Google ScholarGoogle Scholar
  9. Y. Ganjisaffar, R. Caruana, and C. V. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. SIGIR, pages 85--94, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hosseini, I. J. Cox, N. Milic-Frayling, G. Kazai, and V. Vinay. On aggregating labels from multiple crowd workers to infer relevance of documents. In ECIR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. E. Johnson. On bayesian analysis of multirater ordinal data: An application to automated essay grading. Journal of the American Statistical Association, 91(433):42--51, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  12. Chao L. and Y.-M. Wang. Truelabel confusions: A spectrum of probabilistic models in analyzing multiple ratings. In ICML, pages 225--232, 2012.Google ScholarGoogle Scholar
  13. B. Lakshminarayanan and Y. W. Teh. Inferring ground truth from multi-annotator ordinal data: A probabilistic approach. arXiv:1305.0015, 2013.Google ScholarGoogle Scholar
  14. Q. Liu, J. Peng, and A. T Ihler. Variational inference for crowdsourcing. In NIPS, pages 692--700. 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. N. Masters. A rasch model for partial credit scoring. Psychometrika, 47:149--174, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Metrikov, J. Wu, J. Anderton, V. Pavlu, and J. A. Aslam. A modification of lambdamart to handle noisy crowdsourced assessments. In ICTIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Paul Mineiro. Ordered values and mechanical turk. http://www.machinedlearnings.com, 2011.Google ScholarGoogle Scholar
  18. S. Niu, Y. Lan, J. Guo, X. Cheng, L. Yu, and G. Long. Listwise approach for rank aggregation in crowdsourcing. In WSDM, pages 253--262, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3 edition, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. J. Mach. Learn. Res., 11:1297--1322, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Rogers, M. Girolami, and T. Polajnar. Semi-parametric analysis of multi-rater data. Statistics and Computing, 20(3):317--334, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Sheng, F. Provost, and P. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Smucker, G. Kazai, and M. Lease. Overview of the TREC 2013 Crowdsourcing Track. In TREC, 2014.Google ScholarGoogle Scholar
  24. M. D. Smucker, J. Allan, and B. Carterette. A comparison of statistical significance tests for information retrieval evaluation. In CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. S. Uebersax and W. M. Grove. A latent trait finite mixture model for the analysis of rating agreement. In Biometrics, December 1993.Google ScholarGoogle Scholar
  26. M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi. Community-based bayesian aggregation models for crowdsourcing. In WWW, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maksims N. Volkovs and Richard S. Zemel. New learning methods for supervised and unsupervised preference aggregation. JMLR, 15:1135--1176, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. P. Waterhouse. Pay by the bit: An information-theoretic metric for collective human judgment. In CSCW, pages 623--638, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. R. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Zhou, Q. Liu, J. C. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. In ICML, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Zhou, J. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
      October 2015
      1998 pages
      ISBN:9781450337946
      DOI:10.1145/2806416

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader