ABSTRACT
Evaluation is a fundamental part of a recommendation system. Evaluation typically takes one of three forms: (1) smaller lab studies with real users; (2) batch tests with offline collections, judgements, and measures; (3) large-scale controlled experiments (e.g. A/B tests) looking at implicit feedback. But it is rare for the first to inform and influence the latter two; in particular, implicit feedback metrics often have to be continuously revised and updated as assumptions are found to be poorly supported.
Mixed methods research enables practitioners to develop robust evaluation metrics by combining strengths of both qualitative and quantitative approaches. In this tutorial, we will show how qualitative research on user behavior provides insight on the relationship between implicit signals and satisfaction. These insights can inform and augment quantitative modeling and analysis for online and offline metrics and evaluation.
- Eugene Agichtein, Eric Brill, and Susan Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 19--26, 2006. Google ScholarDigital Library
- Ben Carterette. The best published result is random: sequential testing and its effect on reported effectiveness. In Proceedings of the 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015. Google ScholarDigital Library
- Ben Carterette. Statistical significance testing in information retrieval: theory and practice. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017. Google ScholarDigital Library
- Benjamin A Carterette. Multiple testing in statistical analysis of systems-based information retrieval experiments. ACM Transactions on Information Systems (TOIS), 30(1):4, 2012. Google ScholarDigital Library
- Marco Creatura. What we saw at recsys 2017 conference, Sep 2017.Google Scholar
- John W Creswell, Ann Carroll Klassen, Vicki L Plano Clark, and Katherine Clegg Smith. Best practices for mixed methods research in the health sciences. Bethesda (Maryland): National Institutes of Health, 2013:541--545, 2011.Google Scholar
- John W Creswell, Vicki L Plano Clark, Michelle L Gutmann, and William E Hanson. An expanded typology for classifying mixed methods research info design. Handbook of Mixed Methods in Social and Behavioural Research, 2003.Google Scholar
- Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White. Evaluating implicit measures to improve web search. TOIS, 23(2), 2005. Google ScholarDigital Library
- Asela Gunawardana and Guy Shani. Evaluating recommender systems. In Recommender Systems Handbook, pages 265--308. Springer, 2015.Google ScholarCross Ref
- Qi Guo, Haojian Jin, Dmitry Lagun, Shuai Yuan, and Eugene Agichtein. Mining touch interaction data on mobile devices to predict web search result relevance. In Proceedings of the 36th Annual International Conference on Research and Development in Information Retrieval, 2013. Google ScholarDigital Library
- Martyn Hammersley and Paul Atkinson. Ethnography: Principles in practice. Routledge, 2007.Google Scholar
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. In ACM SIGIR Forum, volume 51, pages 4--11. ACM, 2017. Google ScholarDigital Library
- Richard A Johnson and Dean W Wichern. Multivariate analysis. Encyclopedia of Statistical Sciences, 8, 2004.Google Scholar
- Bart P Knijnenburg. Conducting user experiments in recommender systems. In Proceedings of the 6th ACM Conference on Recommender Systems, pages 3--4, 2012. Google ScholarDigital Library
- Jon A Krosnick. Survey research. Annual review of psychology, 50(1):537--567, 1999.Google Scholar
- Mike Kuniavsky. Observing the user experience: a practitioner's guide to user research. Elsevier, 2003. Google ScholarDigital Library
- Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. Research methods in human-computer interaction. Morgan Kaufmann, 2017.Google ScholarDigital Library
- Nils Brede Moe, Torgeir Dingsøyr, and Tore Dybå. A teamwork model for understanding an agile team: A case study of a scrum project. Information and Software Technology, 52(5):480--491, 2010. Google ScholarDigital Library
- Tetsuya Sakai. Conducting laboratory experiments properly with statistical tools: an easy hands-on tutorial. In Proceedings of the 41st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018. Google ScholarDigital Library
- Steven J Taylor, Robert Bogdan, and Marjorie DeVault. Introduction to qualitative research methods: A guidebook and resource. John Wiley & Sons, 2015.Google Scholar
Index Terms
- Mixed methods for evaluating user satisfaction
Recommendations
Understanding and Evaluating User Satisfaction with Music Discovery
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalWe study the use and evaluation of a system for supporting music discovery, the experience of finding and listening to content previously unknown to the user. We adopt a mixed methods approach, including interviews, unsupervised learning, survey ...
Mixed Method Development of Evaluation Metrics
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningDesigners of online search and recommendation services often need to develop metrics to assess system performance. This tutorial focuses on mixed methods approaches to developing user-focused evaluation metrics. This starts with choosing how data is ...
Measuring the User Satisfaction in a Recommendation Interface with Multiple Carousels
IMX '21: Proceedings of the 2021 ACM International Conference on Interactive Media ExperiencesIt is common for video-on-demand and music streaming services to adopt a user interface composed of several recommendation lists, i.e., widgets or swipeable carousels, each generated according to a specific criterion or algorithm (e.g., most recent, ...
Comments