skip to main content
10.1145/2600428.2609577acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multidimensional relevance modeling via psychometrics and crowdsourcing

Published:03 July 2014Publication History

ABSTRACT

While many multidimensional models of relevance have been posited, prior studies have been largely exploratory rather than confirmatory. Lacking a methodological framework to quantify the relationships among factors or measure model fit to observed data, many past models could not be empirically tested or falsified. To enable more positivist experimentation, Xu and Chen [77] proposed a psychometric framework for multidimensional relevance modeling. However, we show their framework exhibits several methodological limitations which could call into question the validity of findings drawn from it. In this work, we identify and address these limitations, scale their methodology via crowdsourcing, and describe quality control methods from psychometrics which stand to benefit crowdsourcing IR studies in general. Methodology we describe for relevance judging is expected to benefit both human-centered and systems-centered IR.

References

  1. Alonso, O. 2013. Implementing crowdsourcing-based relevance experimentation: an industrial perspective. Information Retrieval. 16, 2, 101--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson, J.C. and Gerbing, D.W. 1988. Structural equation modeling in practice: A review and recommended two-step approach. Psychological Bulletin. 103, 3, 411--423.Google ScholarGoogle ScholarCross RefCross Ref
  3. Bailey, P. et al. 2008. Relevance assessment: are judges exchangeable and does it matter. SIGIR'08, 667--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Balatsoukas, P. and Ruthven, I. 2012. An eye-tracking approach to the analysis of relevance judgments on the Web: The case of Google search engine. JASIST. 63, 9, 1728--1746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baokstein, A. 1979. Relevance. JASIS. 30, 5, 269--273.Google ScholarGoogle ScholarCross RefCross Ref
  6. Barry, C.L. 1994. User-defined relevance criteria: An exploratory study. JASIS. 45, 3, 149--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barry, C.L. and Schamber, L. 1998. Users' criteria for relevance evaluation: A cross-situational comparison. IP & M. 34, 2--3, 219--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bateman, J. 1998. Changes in Relevance Criteria: A Longitudinal Study. Proceedings of the ASIS Annual Meeting. 35, 23--32.Google ScholarGoogle Scholar
  9. Behrend, T.S. et al. 2011. The viability of crowdsourcing for survey research. Behavior research methods. 43, 3, 800--813.Google ScholarGoogle Scholar
  10. Blanco, R. et al. 2011. Repeatable and Reliable Search System Evaluation Using Crowdsourcing. Proceedings of SIGIR'2011 New York, NY, USA, 923--932. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Borlund, P. 2003. The concept of relevance in IR. JASIST. 54, 10, 913--925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Boyce, B. 1982. Beyond topicality: A two stage view of relevance and the retrieval process. IP & M 18, 3, 105--109.Google ScholarGoogle Scholar
  13. Bradford, S.C. 1934. Sources of information on specific subjects. Engineering: An Illustrated Weekly Journal (London). 137, 26, 85--86.Google ScholarGoogle Scholar
  14. Browne, M.W. 2000. Psychometrics. Journal of the American Statistical Association. 95, 450, 661--665.Google ScholarGoogle ScholarCross RefCross Ref
  15. Cacioppo, J.T. and Petty, R.E. 1984. The Elaboration Likelihood Model of Persuasion. Advances in Consumer Research. 11, 1 673--675.Google ScholarGoogle Scholar
  16. Chouldechova, A. and Mease, D. 2013. Differences in Search Engine Evaluations Between Query Owners and Non-owners. Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cognitive Interviewing: http://www.uk.sagepub.com/textbooks/Book225856?prodId=Book225856. Accessed: 2014-01--24.Google ScholarGoogle Scholar
  18. Cohen, J. 1988. Statistical power analysis for the behavioral sciences. L. Erlbaum Associates.Google ScholarGoogle Scholar
  19. Cool, C. et al. 1993. Characteristics of Texts affecting relevance judgements. Proceedings of the 14th National Online Meeting, 77--84.Google ScholarGoogle Scholar
  20. Da Costa Pereira, C. et al. 2012. Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting. IP & M. 48, 2, 340--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cuadra, C.A. and Katter, R.V. 1967. Opening the Black Box of "Relevance." Journal of Documentation. 23, 4, 291--303.Google ScholarGoogle ScholarCross RefCross Ref
  22. Dwyer, J. 2002. Communication in Business: Strategies and Skills. Prentice Hall.Google ScholarGoogle Scholar
  23. Eickhoff, C. et al. 2013. Copulas for Information Retrieval. Proceedings of SIGIR'2013 (New York, NY, USA), 663--672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Eickhoff, C. and Vries, A.P. de 2013. Increasing cheat robustness of crowdsourcing tasks. Information Retrieval. 16, 2, 121--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Franklin, S.B. et al. 1995. Parallel Analysis: a method for determining significant principal components. Journal of Vegetation Science. 6, 1, 99--106.Google ScholarGoogle ScholarCross RefCross Ref
  26. Furr, M. 2011. Scale Construction and Psychometrics for Social and Personality Psychology. SAGE.Google ScholarGoogle Scholar
  27. Goldberg, L.R. and Kilkowski, J.M. 1985. The prediction of semantic consistency in self-descriptions: characteristics of persons and of terms that affect the consistency of responses to synonym and antonym pairs. Journal of personality and social psychology. 48, 1, 82--98.Google ScholarGoogle ScholarCross RefCross Ref
  28. Green, R. 1995. Topical relevance relationships. I. Why topic matching fails. JASIS. 46, 9, 646--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Greisdorf, H. 2003. Relevance thresholds: a multi-stage predictive model of how users evaluate information. IP & M, 403--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Grice, H.P. 1989. Studies in the way of words. Harvard University Press.Google ScholarGoogle Scholar
  31. Gwizdka, J. 2014. News Stories Relevance Effects on Eye-movements. Proceedings of the Symposium on Eye Tracking Research and Applications, 283--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Harter, S.P. 1992. Psychological relevance and information science. JASIS. 43, 9, 602--615.Google ScholarGoogle ScholarCross RefCross Ref
  33. Hatcher, L. 2013. Advanced statistics in research: reading, understanding, and writing up data analysis results. ShadowFinch Media, LLC.Google ScholarGoogle Scholar
  34. Hjørland, B. and Christensen, F.S. 2002. Work tasks and socio-cognitive relevance: A specific example. JASIST. 53, 11, 960--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hosseini, M. et al. 2012. On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents. Advances in Information Retrieval. R. Baeza-Yates et al., eds. Springer Berlin Heidelberg. 182--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hox, J.J. and Bechger, T.M. 2007. An introduction to structural equation modeling.Google ScholarGoogle Scholar
  37. Hu, L. and Bentler, P.M. 1999. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal. 6, 1, 1--55.Google ScholarGoogle ScholarCross RefCross Ref
  38. Huang, X. and Soergel, D. 2013. Relevance: An improved framework for explicating the notion. JASIST. 64, 1, 18--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Johnson, J.R. et al. 1981. Characteristics of Errors in Accounts Receivable and Inventory Audits. The Accounting Review. 56, 2, 270--293.Google ScholarGoogle Scholar
  40. Kazai, G. et al. 2012. An Analysis of Systematic Judging Errors in Information Retrieval. Proceedings of CIKM'2012 (New York, NY, USA), 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kazai, G. et al. 2011. Crowdsourcing for Book Search Evaluation: Impact of Hit Design on Comparative System Ranking. Proceedings of SIGIR'2011 (New York, NY, USA), 205--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kittur, A. et al. 2008. Crowdsourcing User Studies with Mechanical Turk. Proceedings of SIGCHI'2008 (New York, NY, USA), 453--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lancaster, F.W. 1968. Information retrieval systems: characteristics, testing, and evaluation. Wiley.Google ScholarGoogle Scholar
  44. Lesk, M.E. and Salton, G. 1968. Relevance assessments and retrieval system evaluation. Information Storage and Retrieval. 4, 4, 343--359.Google ScholarGoogle ScholarCross RefCross Ref
  45. Levitin, A. and Redman, T. 1995. Quality dimensions of a conceptual view. IP & M. 31, 1, 81--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Little, G. 2009. TurKit: Tools for iterative tasks on mechanical turk. IEEE Symposium on Visual Languages and Human-Centric Computing, 252--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Liu, T.-Y. 2009. Learning to Rank for Information Retrieval. Found. Trends Inf. Retr. 3, 3, 225--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M, P. and Bonett, D.G. 1980. Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin. 88, 3, 588--606.Google ScholarGoogle ScholarCross RefCross Ref
  49. Maron, M.E. 1977. On indexing, retrieval and the meaning of about. JASIS. 28, 1, 38--43.Google ScholarGoogle ScholarCross RefCross Ref
  50. Marshall, C.C. and Shipman, F.M. 2013. Experiences Surveying the Crowd: Reflections on Methods, Participation, and Reliability. Proceedings of the 5th Annual ACM Web Science Conference, 234--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mizzaro, S. 1997. Relevance: The whole history. JASIS. 48, 9, 810--832. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Moshfeghi, Y. et al. 2013. Understanding Relevance: An fMRI Study. Advances in Information Retrieval. P. Serdyukov et al., eds. Springer Berlin Heidelberg. 14--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mueller, R.O. and Hancock, G.R. 2008. Best practices in structural equation modeling. Best practices in quantitative methods. 488--508.Google ScholarGoogle Scholar
  54. Murphy, K.P. 2012. Machine Learning: A Probabilistic Perspective. Mit Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Pearson - Modern Measurement: Theory, Principles, and Applications of Mental Appraisal, 2/E - Steven J. Osterlind: http://www.pearsonhighered.com/educator/product/Modern-Measurement-Theory-Principles-and-Applications-of-Mental-Appraisal/9780137010257.page. Accessed: 2014-01--24.Google ScholarGoogle Scholar
  56. Principles and Practice of Structural Equation Modeling: Third Edition: http://www.guilford.com/cgi-bin/cartscript.cgi?page=pr/kline.htm & dir=research/res_quant. Accessed: 2014-01--24.Google ScholarGoogle Scholar
  57. Proceedings of the International Conference on Scientific Information -- Two Volumes: http://books.nap.edu/openbook.php?record_id=10866 & page=687. Accessed: 2014-01-26.Google ScholarGoogle Scholar
  58. Rees, A.M. and Schultz, D.G. 1967. A Field Experimental Approach to the Study of Relevance Assessments in Relation to Document Searching. Final Report to the National Science Foundation. Volume I.Google ScholarGoogle Scholar
  59. Relevance as process: judgements in the context of scholarly research: http://www.informationr.net/ir/10--2/paper226. Accessed: 2014-01--24.Google ScholarGoogle Scholar
  60. Sanderson, M. 2010. Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval. 4, 4, 247--375.Google ScholarGoogle ScholarCross RefCross Ref
  61. Saracevic, T. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance. JASIST. 58, 13, 1915--1933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Saracevic, T. 2007. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. JASIST. 58, 13, 2126--2144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Schamber, L. 1994. Relevance and Information Behavior. Annual Review of Information Science and Technology (ARIST). 29, 3--48.Google ScholarGoogle Scholar
  64. Scheines, R. et al. 1999. Bayesian estimation and testing of structural equation models. Psychometrika. 64, 1, 37--52.Google ScholarGoogle ScholarCross RefCross Ref
  65. Tabachnick, B.G. and Fidell, L.S. 2012. Using Multivariate Statistics. Pearson Education, Limited.Google ScholarGoogle Scholar
  66. Tang, R. and Solomon, P. 1998. Toward an understanding of the dynamics of relevance judgment: An analysis of one person's search behavior. IP & M. 34, 2--3, 237--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Taylor, A.R. et al. 2007. Relationships between categories of relevance criteria and stage in task completion. IP & M. 43, 4, 1071--1084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. The Social Construction of Meaning: An Alternative Perspective on Information Sharing: 2003. http://pubsonline.informs.org/doi/abs/10.1287/isre.14.1.87.14765. Accessed: 2014-01--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Tsikrika, T. and Lalmas, M. 2007. Combining Evidence for Relevance Criteria: A Framework and Experiments in Web Retrieval. Advances in Information Retrieval. G. Amati et al., eds. Springer Berlin Heidelberg. 481--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. Journal of Documentation. 56, 5, 540--562.Google ScholarGoogle ScholarCross RefCross Ref
  71. Voorhees, E.M. 1998. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Proceedings of SIGIR'1998 (New York, NY, USA), 315--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Wilson, D. and Sperber, D. 2002. Relevance Theory. Handbook of Pragmatics. G. Ward and L. Horn, eds. Blackwell.Google ScholarGoogle Scholar
  73. De Winter, J.C.F. and Dodou, D. 2012. Factor recovery by principal axis factoring and maximum likelihood factor analysis as a function of factor pattern and sample size. Journal of Applied Statistics. 39, 4, 695--710.Google ScholarGoogle ScholarCross RefCross Ref
  74. Worthington, R.L. and Whittaker, T.A. 2006. Scale Development Research A Content Analysis and Recommendations for Best Practices. The Counseling Psychologist. 34, 6, 806--838.Google ScholarGoogle ScholarCross RefCross Ref
  75. Wright, S. Correlation and causation.Google ScholarGoogle Scholar
  76. Xu, Y. (Calvin) and Chen, Z. 2006. Relevance judgment: What do information users consider beyond topicality? JASIST. 57, 7, 961--973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Zuccon, G. et al. 2013. Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems. Information Retrieval. 16, 2, 267--305. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multidimensional relevance modeling via psychometrics and crowdsourcing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
      July 2014
      1330 pages
      ISBN:9781450322577
      DOI:10.1145/2600428

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 July 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader