research-article

Measuring Predictive Performance of User Models: The Details Matter

Author:
Radek Pelánek

Masaryk University, Brno, Czech Rep

Masaryk University, Brno, Czech Rep
View Profile

UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and PersonalizationJuly 2017Pages 197–201https://doi.org/10.1145/3099023.3099042

Published:09 July 2017Publication History

UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization

Pages 197–201

ABSTRACT

Evaluation of user modeling techniques is often based on the predictive accuracy of models. The quantification of predictive accuracy is done using performance metrics. We show that the choice of a performance metric is important and that even details of metric computation matter. We analyze in detail two commonly used metrics (AUC, RMSE) in the context of student modeling. We discuss different approaches to their computation (global, averaging across skill, averaging across students) and show that these methods have different properties. An analysis of recent research papers shows that the reported descriptions of metric computation are often insufficient. To make research conclusions valid and reproducible, researchers need to pay more attention to the choice of performance metrics and they need to describe more explicitly details of their computation.

References

Ryan Baker. 2013. A'/AUC Code. http://www.columbia.edu/~rsb216/edmtools.html. (2013).Google Scholar
Ryan SJ Baker, Albert T Corbett, and Vincent Aleven. 2008. More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing. In Proc. of Intelligent Tutoring Systems. Springer, 406--415. Google ScholarDigital Library
Joseph Beck. 2007. Difficulties in inferring student knowledge from observations (and why you should care) Proc. of Educational Data Mining. 21--30.Google Scholar
Joseph E Beck and Kai-min Chang 2007. Identifiability: A fundamental problem of student modeling. User Modeling 2007. Springer, 137--146. Google ScholarDigital Library
Joseph E Beck and Xiaolu Xiong 2013. Limits to accuracy: How well can we do at student modeling Educational Data Mining. 4--11.Google Scholar
Glenn W Brier. 1950. Verification of forecasts expressed in terms of probability. Monthly weather review Vol. 78, 1 (1950), 1--3.Google Scholar
Asif Dhanani, Seung Yeon Lee, Phitchaya Phothilimthana, and Zachary Pardos 2014. A comparison of error metrics for learning model parameters in bayesian knowledge tracing. Technical Report. Technical Report UCB/EECS-2014-131, EECS Department, University of California, Berkeley.Google Scholar
Tom Fawcett. 2004. ROC graphs: Notes and practical considerations for researchers. Machine learning, Vol. 31, 1 (2004), 1--38.Google Scholar
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters Vol. 27, 8 (2006), 861--874. Google ScholarDigital Library
James Fogarty, Ryan S Baker, and Scott E Hudson. 2005. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In Proc. of Graphics Interface 2005. 129--136. Google ScholarDigital Library
Tilmann Gneiting and Adrian E Raftery 2007. Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. Vol. 102, 477 (2007), 359--378.Google ScholarCross Ref
Yue Gong, Joseph E Beck, and Neil T Heffernan 2010. Comparing knowledge tracing and performance factor analysis by using multiple model fitting procedures. In Intelligent Tutoring Systems. Springer, 35--44. Google ScholarDigital Library
Yue Gong, Joseph E Beck, and Neil T Heffernan 2011. How to construct more accurate student models: Comparing and optimizing knowledge tracing and performance factor analysis. International Journal of Artificial Intelligence in Education, Vol. 21, 1--2 (2011), 27--46. Google ScholarDigital Library
JP González-Brenes, Yun Huang, and Peter Brusilovsky. 2014. General features in knowledge tracing: Applications to multiple subskills, temporal item response theory, and expert knowledge. Proc. of Educational Data Mining. 84--91.Google Scholar
José P González-Brenes. 2015. Modeling Skill Acquisition Over Time with Sequence and Topic Modeling. AISTATS.Google Scholar
José P González-Brenes and Yun Huang 2015. Your model is predictive - but is it useful? theoretical and empirical considerations of a new paradigm for adaptive tutoring evaluation Proc. of Educational Data Mining.Google Scholar
José P González-Brenes and Jack Mostow 2013. What and when do students learn? Fully data-driven joint estimation of cognitive and student models. In Proc. of Educational Data Mining. 236--240.Google Scholar
Thomas M Hamill and Josip Juras 2006. Measuring forecast skill: is it real skill or is it the varying climatology? Quarterly Journal of the Royal Meteorological Society, Vol. 132, 621C (2006), 2905--2923.Google ScholarCross Ref
David J Hand. 2009. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine learning, Vol. 77, 1 (2009), 103--123. Google ScholarDigital Library
T. Käser, S. Klingler, A. G. Schwing, and M. Gross 2014. Beyond Knowledge Tracing: Modeling Skill Topologies with Bayesian Networks Proc. of Intelligent Tutoring Systems. 188--198.Google Scholar
Tanja Käser, Kenneth R Koedinger, and Markus Gross. 2014. Different parameters - same prediction: An analysis of learning curves Proc. of Educational Data Mining. 52--59.Google Scholar
Mohammad Khajah, Robert V Lindsey, and Michael C Mozer. 2016. How deep is knowledge tracing?. In Proc. of Educational Data Mining.Google Scholar
Jorge M Lobo, Alberto Jiménez-Valverde, and Raimundo Real. 2008. AUC: a misleading measure of the performance of predictive distribution models. Global ecology and Biogeography Vol. 17, 2 (2008), 145--151.Google Scholar
Caren Marzban. 2004. The ROC curve and the area under it as performance measures. Weather and Forecasting Vol. 19, 6 (2004), 1106--1114.Google ScholarCross Ref
Allan H Murphy. 1973. A new vector partition of the probability score. Journal of Applied Meteorology Vol. 12, 4 (1973), 595--600.Google ScholarCross Ref
Juraj Nivznan, Radek Pelánek, and Jivrí vRihák. 2015. Student Models for Prior Knowledge Estimation. In Educational Data Mining.Google Scholar
J. Papouvsek, R. Pelánek, and V. Stanislav. 2014. Adaptive Practice of Facts in Domains with Varied Prior Knowledge Educational Data Mining. 6--13.Google Scholar
Zachary A Pardos, Yoav Bergner, Daniel T Seaton, and David E Pritchard 2013. Adapting Bayesian Knowledge Tracing to a Massive Open Online Course in edX Proc. of Educational Data Mining. 137--144.Google Scholar
Zachary A Pardos, Sujith M Gowda, Ryan SJ Baker, and Neil T Heffernan 2012. The sum is greater than the parts: ensembling models of student knowledge in educational software. ACM SIGKDD explorations newsletter Vol. 13, 2 (2012), 37--44. Google ScholarDigital Library
Zachary A Pardos and Neil T Heffernan 2011. KT-IDEM: Introducing item difficulty to the knowledge tracing model. User Modeling, Adaption and Personalization (2011), 243--254. Google ScholarDigital Library
Zachary A Pardos and Michael V Yudelson 2013. Towards Moment of Learning Accuracy. In AIED 2013 Workshops Proceedings Volume 4. 3.Google Scholar
Radek Pelánek. 2015. Metrics for Evaluation of Student Models. Journal of Educational Data Mining Vol. 7, 2 (2015).Google Scholar
Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein 2015. Deep Knowledge Tracing. In Advances in Neural Information Processing Systems. 505--513. Google ScholarDigital Library
Michael Sao Pedro, Ryan Shaun Baker, and Janice D Gobert. 2013. Incorporating Scaffolding and Tutor Context into Bayesian Knowledge Tracing to Predict Inquiry Skill Acquisition.. In Proc. of Educational Data Mining. 185--192.Google Scholar
Zoltan Toth, Olivier Talagrand, Guillem Candille, and Yuejian Zhu 2003. Forecast Verification: A Practitioner's Guide in Atmospheric Science. Wiley, Chapter Probability and ensemble forecasts, 137--163.Google Scholar
Yutao Wang and Joseph Beck 2013. Class vs. Student in a Bayesian Network Student Model Artificial Intelligence in Education. Springer, 151--160.Google Scholar
Yutao Wang and Neil Heffernan 2013. Extending knowledge tracing to allow partial credit: using continuous versus binary nodes Artificial Intelligence in Education. Springer, 181--188.Google Scholar
Michael V Yudelson, Kenneth R Koedinger, and Geoffrey J Gordon 2013. Individualized Bayesian Knowledge Tracing Models. Artificial Intelligence in Education. Springer, 171--180.Google Scholar

Index Terms

Measuring Predictive Performance of User Models: The Details Matter
1. Computing methodologies
  1. Machine learning
    1. Cross-validation

Recommendations

Assessing spatial predictive models in the environmental sciences

A comprehensive assessment of the performance of predictive models is necessary as they have been increasingly employed to generate spatial predictions for environmental management and conservation and their accuracy is crucial to evidence-informed ...
Read More
Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study
Background. Slice-based cohesion metrics leverage program slices with respect to the output variables of a module to quantify the strength of functional relatedness of the elements within the module. Although slice-based cohesion metrics have been ...
Read More
Predictive vector quantization with ridge regression
DCC '96: Proceedings of the Conference on Data Compression

Prediction can play an important role in image compression. Better rate-distortion tradeoffs can be achieved by coding the residuals from predictive schemes rather than the direct pixel values. The price is only a modest increase in complexity. A ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization
July 2017
456 pages
ISBN:9781450350679
DOI:10.1145/3099023
Editors:
Marko Tkalcic
Free University of Bolzano
,
Dhaval Thakker
University of Leeds
,
Panagiotis Germanakos
SAP SE & University of Cyprus
,
Kalina Yacef
University of Sydney
,
Cecile Paris
CSIRO ICT Centre
,
Olga Santos
Spanish National University for Distance Education
,
General Chairs:
Maria Bielikova
Slovak University of Technology in Bratislava, Slovakia
,
Eelco Herder
L3S Research Center, Germany and Radboud Universiteit Nijmegen, The Netherlands
,
Program Chairs:
Michel Desmarais
Polytechnique University, Montreal, Canada
,
Federica Cena
University of Torino, Italy
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 July 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
auc
evaluation
metrics
predictive accuracy
rmse
student modeling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate162of633submissions,26%
Upcoming Conference
UMAP '24

Sponsor:

sigchi

sigchi

32nd ACM Conference on User Modeling, Adaptation and Personalization

July 1 - 4, 2024

Cagliari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 113
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measuring Predictive Performance of User Models: The Details Matter

UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Assessing spatial predictive models in the environmental sciences

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study

Predictive vector quantization with ridge regression