skip to main content
10.1145/3099023.3099042acmconferencesArticle/Chapter ViewAbstractPublication PagesumapConference Proceedingsconference-collections
research-article

Measuring Predictive Performance of User Models: The Details Matter

Published:09 July 2017Publication History

ABSTRACT

Evaluation of user modeling techniques is often based on the predictive accuracy of models. The quantification of predictive accuracy is done using performance metrics. We show that the choice of a performance metric is important and that even details of metric computation matter. We analyze in detail two commonly used metrics (AUC, RMSE) in the context of student modeling. We discuss different approaches to their computation (global, averaging across skill, averaging across students) and show that these methods have different properties. An analysis of recent research papers shows that the reported descriptions of metric computation are often insufficient. To make research conclusions valid and reproducible, researchers need to pay more attention to the choice of performance metrics and they need to describe more explicitly details of their computation.

References

  1. Ryan Baker. 2013. A'/AUC Code. http://www.columbia.edu/~rsb216/edmtools.html. (2013).Google ScholarGoogle Scholar
  2. Ryan SJ Baker, Albert T Corbett, and Vincent Aleven. 2008. More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing. In Proc. of Intelligent Tutoring Systems. Springer, 406--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joseph Beck. 2007. Difficulties in inferring student knowledge from observations (and why you should care) Proc. of Educational Data Mining. 21--30.Google ScholarGoogle Scholar
  4. Joseph E Beck and Kai-min Chang 2007. Identifiability: A fundamental problem of student modeling. User Modeling 2007. Springer, 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Joseph E Beck and Xiaolu Xiong 2013. Limits to accuracy: How well can we do at student modeling Educational Data Mining. 4--11.Google ScholarGoogle Scholar
  6. Glenn W Brier. 1950. Verification of forecasts expressed in terms of probability. Monthly weather review Vol. 78, 1 (1950), 1--3.Google ScholarGoogle Scholar
  7. Asif Dhanani, Seung Yeon Lee, Phitchaya Phothilimthana, and Zachary Pardos 2014. A comparison of error metrics for learning model parameters in bayesian knowledge tracing. Technical Report. Technical Report UCB/EECS-2014-131, EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  8. Tom Fawcett. 2004. ROC graphs: Notes and practical considerations for researchers. Machine learning, Vol. 31, 1 (2004), 1--38.Google ScholarGoogle Scholar
  9. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters Vol. 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. James Fogarty, Ryan S Baker, and Scott E Hudson. 2005. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In Proc. of Graphics Interface 2005. 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tilmann Gneiting and Adrian E Raftery 2007. Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. Vol. 102, 477 (2007), 359--378.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yue Gong, Joseph E Beck, and Neil T Heffernan 2010. Comparing knowledge tracing and performance factor analysis by using multiple model fitting procedures. In Intelligent Tutoring Systems. Springer, 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yue Gong, Joseph E Beck, and Neil T Heffernan 2011. How to construct more accurate student models: Comparing and optimizing knowledge tracing and performance factor analysis. International Journal of Artificial Intelligence in Education, Vol. 21, 1--2 (2011), 27--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. JP González-Brenes, Yun Huang, and Peter Brusilovsky. 2014. General features in knowledge tracing: Applications to multiple subskills, temporal item response theory, and expert knowledge. Proc. of Educational Data Mining. 84--91.Google ScholarGoogle Scholar
  15. José P González-Brenes. 2015. Modeling Skill Acquisition Over Time with Sequence and Topic Modeling. AISTATS.Google ScholarGoogle Scholar
  16. José P González-Brenes and Yun Huang 2015. Your model is predictive - but is it useful? theoretical and empirical considerations of a new paradigm for adaptive tutoring evaluation Proc. of Educational Data Mining.Google ScholarGoogle Scholar
  17. José P González-Brenes and Jack Mostow 2013. What and when do students learn? Fully data-driven joint estimation of cognitive and student models. In Proc. of Educational Data Mining. 236--240.Google ScholarGoogle Scholar
  18. Thomas M Hamill and Josip Juras 2006. Measuring forecast skill: is it real skill or is it the varying climatology? Quarterly Journal of the Royal Meteorological Society, Vol. 132, 621C (2006), 2905--2923.Google ScholarGoogle ScholarCross RefCross Ref
  19. David J Hand. 2009. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, Vol. 77, 1 (2009), 103--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Käser, S. Klingler, A. G. Schwing, and M. Gross 2014. Beyond Knowledge Tracing: Modeling Skill Topologies with Bayesian Networks Proc. of Intelligent Tutoring Systems. 188--198.Google ScholarGoogle Scholar
  21. Tanja Käser, Kenneth R Koedinger, and Markus Gross. 2014. Different parameters - same prediction: An analysis of learning curves Proc. of Educational Data Mining. 52--59.Google ScholarGoogle Scholar
  22. Mohammad Khajah, Robert V Lindsey, and Michael C Mozer. 2016. How deep is knowledge tracing?. In Proc. of Educational Data Mining.Google ScholarGoogle Scholar
  23. Jorge M Lobo, Alberto Jiménez-Valverde, and Raimundo Real. 2008. AUC: a misleading measure of the performance of predictive distribution models. Global ecology and Biogeography Vol. 17, 2 (2008), 145--151.Google ScholarGoogle Scholar
  24. Caren Marzban. 2004. The ROC curve and the area under it as performance measures. Weather and Forecasting Vol. 19, 6 (2004), 1106--1114.Google ScholarGoogle ScholarCross RefCross Ref
  25. Allan H Murphy. 1973. A new vector partition of the probability score. Journal of Applied Meteorology Vol. 12, 4 (1973), 595--600.Google ScholarGoogle ScholarCross RefCross Ref
  26. Juraj Nivznan, Radek Pelánek, and Jivrí vRihák. 2015. Student Models for Prior Knowledge Estimation. In Educational Data Mining.Google ScholarGoogle Scholar
  27. J. Papouvsek, R. Pelánek, and V. Stanislav. 2014. Adaptive Practice of Facts in Domains with Varied Prior Knowledge Educational Data Mining. 6--13.Google ScholarGoogle Scholar
  28. Zachary A Pardos, Yoav Bergner, Daniel T Seaton, and David E Pritchard 2013. Adapting Bayesian Knowledge Tracing to a Massive Open Online Course in edX Proc. of Educational Data Mining. 137--144.Google ScholarGoogle Scholar
  29. Zachary A Pardos, Sujith M Gowda, Ryan SJ Baker, and Neil T Heffernan 2012. The sum is greater than the parts: ensembling models of student knowledge in educational software. ACM SIGKDD explorations newsletter Vol. 13, 2 (2012), 37--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zachary A Pardos and Neil T Heffernan 2011. KT-IDEM: Introducing item difficulty to the knowledge tracing model. User Modeling, Adaption and Personalization (2011), 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zachary A Pardos and Michael V Yudelson 2013. Towards Moment of Learning Accuracy. In AIED 2013 Workshops Proceedings Volume 4. 3.Google ScholarGoogle Scholar
  32. Radek Pelánek. 2015. Metrics for Evaluation of Student Models. Journal of Educational Data Mining Vol. 7, 2 (2015).Google ScholarGoogle Scholar
  33. Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein 2015. Deep Knowledge Tracing. In Advances in Neural Information Processing Systems. 505--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michael Sao Pedro, Ryan Shaun Baker, and Janice D Gobert. 2013. Incorporating Scaffolding and Tutor Context into Bayesian Knowledge Tracing to Predict Inquiry Skill Acquisition.. In Proc. of Educational Data Mining. 185--192.Google ScholarGoogle Scholar
  35. Zoltan Toth, Olivier Talagrand, Guillem Candille, and Yuejian Zhu 2003. Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley, Chapter Probability and ensemble forecasts, 137--163.Google ScholarGoogle Scholar
  36. Yutao Wang and Joseph Beck 2013. Class vs. Student in a Bayesian Network Student Model Artificial Intelligence in Education. Springer, 151--160.Google ScholarGoogle Scholar
  37. Yutao Wang and Neil Heffernan 2013. Extending knowledge tracing to allow partial credit: using continuous versus binary nodes Artificial Intelligence in Education. Springer, 181--188.Google ScholarGoogle Scholar
  38. Michael V Yudelson, Kenneth R Koedinger, and Geoffrey J Gordon 2013. Individualized Bayesian Knowledge Tracing Models. Artificial Intelligence in Education. Springer, 171--180.Google ScholarGoogle Scholar

Index Terms

  1. Measuring Predictive Performance of User Models: The Details Matter

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization
      July 2017
      456 pages
      ISBN:9781450350679
      DOI:10.1145/3099023

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 July 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate162of633submissions,26%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader