ABSTRACT
Course instructors need to be able to identify students in need of assistance as early in the course as possible. Recent work has suggested that machine learning approaches applied to snapshots of small programming exercises may be an effective solution to this problem. However, these results have been obtained using data from a single institution, and prior work using features extracted from student code has been highly sensitive to differences in context. This work provides two contributions: first, a partial reproduction of previously published results, but in a different context, and second, an exploration of the efficacy of neural networks in solving this problem. Our findings confirm the importance of two features (the number of steps required to solve a problem and the correctness of key problems), indicate that machine learning techniques are relatively stable across contexts (both across terms in a single course and across courses), and suggest that neural network based approaches are as effective as the best Bayesian and decision tree methods. Furthermore, neural networks can be tuned to be reliably pessimistic, so they may serve a complementary role in solving the problem of identifying students who need assistance.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.Google Scholar
- A. Ahadi, R. Lister, H. Haapala, and A. Vihavainen. Exploring machine learning methods to automatically identify students in need of assistance. In Proceedings of the eleventh annual International Conference on International Computing Education Research, pages 121--130. ACM, 2015. Google ScholarDigital Library
- B. A. Becker. A new metric to quantify repeated compiler errors for novice programmers. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education, pages 296--301. ACM, 2016. Google ScholarDigital Library
- J. Bennedsen and M. E. Caspersen. Failure rates in introductory programming. ACM SIGCSE Bulletin, 39(2):32--36, 2007. Google ScholarDigital Library
- S. Bergin and R. Reilly. Predicting introductory programming performance: A multi-institutional multivariate study. Computer Science Education, 16(4):303--323, 2006. Google ScholarCross Ref
- J. Bichsel. Analytics in higher education: Benefits, barriers, progress, and recommendations, 2012. Accessed August 24, 2016.Google Scholar
- A. S. Carter, C. D. Hundhausen, and O. Adesope. The normalized programming state model: Predicting student performance in computing courses based on programming behavior. In Proceedings of the eleventh annual International Conference on International Computing Education Research, pages 141--150. ACM, 2015. Google ScholarDigital Library
- A. El Gamal. An educational data mining model for predicting student performance in programming course. International Journal of Computer Applications, 70(17), 2013.Google Scholar
- A. Elbadrawy, S. Studham, and G. Karypis. Personalized multi-regression models for predicting students performance in course activities. UMN CS, pages 14--011, 2014.Google Scholar
- M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res, 15(1):3133--3181, 2014. Google ScholarDigital Library
- M. Fire, G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting student exam's scores by analyzing social network data. In International Conference on Active Media Technology, pages 584--595. Springer, 2012. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10--18, 2009. Google ScholarDigital Library
- D. Hovemeyer and J. Spacco. Progsnap specification. Accessed August 26, 2016.Google Scholar
- P. Ihantola, A. Vihavainen, A. Ahadi, M. Butler, J. Börstler, S. H. Edwards, E. Isohanni, A. Korhonen, A. Petersen, K. Rivers, M. A. Rubio, J. Sheard, B. Skupas, J. Spacco, C. Szabo, and D. Toll. Educational data mining and learning analytics in programming: Literature review and case studies. In Proceedings of the 2015 ITiCSE on Working Group Reports, ITICSE-WGR '15, pages 41--63, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- M. C. Jadud. Methods and tools for exploring novice compilation behaviour. In Proceedings of the Second International Workshop on Computing Education Research, ICER '06, pages 73--84, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- N. Jaques and J. Nutini. A comparison of random forests and dropout nets for sign language recognition with the kinect.Google Scholar
- P. Kinnunen and L. Malmi. Why students drop out CS1 course? In Proceedings of the Second International Workshop on Computing Education Research, pages 97--108, 2006. Google ScholarDigital Library
- S. N. Liao, D. Zingaro, M. A. Laurenzano, W. G. Griswold, and L. Porter. Lightweight, early identification of at-risk cs1 students. In Proceedings of the 2016 ACM Conference on International Computing Education Research, ICER '16, pages 123--131, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- J. Luo, S. E. Sorour, K. Goda, and T. Mine. Predicting student grade based on free-style comments using word2vec and ann by considering prediction results obtained in consecutive lessons. International Educational Data Mining Society, 2015.Google Scholar
- B. Minaei-Bidgoli and W. F. Punch. Using genetic algorithms for data mining optimization in an educational web-based system. In Genetic and evolutionary computation conference, pages 2252--2263. Springer, 2003. Google ScholarDigital Library
- D. Mullier, D. Moore, and D. Hobbs. A neural-network system for automatically assessing students. In World conference on educational multimedia, hypermedia and telecommunications, pages 1366--1371, 2001.Google Scholar
- E. Osmanbegović and M. Suljić. Data mining approach for predicting student performance. Economic Review, 10(1), 2012.Google Scholar
- A. Petersen, M. Craig, J. Campbell, and A. Tafliovich. Revisiting why students drop CS1. In Proceedings of the 16th Koli Calling International Conference on Computing Education Research, Koli Calling '16, pages 71--80, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- A. Petersen, J. Spacco, and A. Vihavainen. An exploration of error quotient in multiple contexts. In Proceedings of the 15th Koli Calling Conference on Computing Education Research, pages 77--86. ACM, 2015. Google ScholarDigital Library
- D. L. Richmond, D. Kainmueller, M. Yang, E. W. Myers, and C. Rother. Mapping stacked decision forests to deep and sparse convolutional neural networks for semantic segmentation. arXiv preprint arXiv:1507.07583, 2015.Google Scholar
- M. M. Rodrigo, E. Tabanao, M. B. Lahoz, and M. C. Jadud. Analyzing online protocols to characterize novice java programmers. Philippine Journal of Science, 138(2):177--190, 2009.Google Scholar
- M. M. T. Rodrigo, R. S. Baker, M. C. Jadud, A. C. M. Amarra, T. Dy, M. B. V. Espejo-Lahoz, S. A. L. Lim, S. A. Pascua, J. O. Sugay, and E. S. Tabanao. Affective and behavioral predictors of novice programmer achievement. In Proceedings of the 14th Annual ACM SIGCSE Conference on Innovation and Technology in Computer Science Education, ITiCSE '09, pages 156--160, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- E. S. Tabanao, Ma, and M. C. Jadud. Predicting At-risk Novice Java Programmers Through the Analysis of Online Protocols. In Proceedings of the Seventh International Workshop on Computing Education Research, ICER '11, pages 85--92, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- A. Vihavainen, T. Vikberg, M. Luukkainen, and M. Partel. Scaffolding students' learning using test my code. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education, pages 117--122. ACM, 2013. Google ScholarDigital Library
- C. Watson and F. W. Li. Failure rates in introductory programming revisited. In Proceedings of the 2014 conference on Innovation & technology in computer science education, pages 39--44. ACM, 2014. Google ScholarDigital Library
- C. Watson, F. W. Li, and J. Godwin. Predicting performance in an introductory programming course by logging and analyzing student programming behavior. In In Advanced Learning Technologies, pages 31--323. IEEE, 2013. Google ScholarDigital Library
- M. Yorke. Formative assessment and its relevance to retention. Higher Education Research & Development, 20(2):115--126, 2001. Google ScholarCross Ref
- D. Zingaro, Y. Cherenkova, O. Karpova, and A. Petersen. Facilitating code-writing in pi classes. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education, SIGCSE '13, pages 585--590, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
Index Terms
Evaluating Neural Networks as a Method for Identifying Students in Need of Assistance
Recommendations
Exploring Machine Learning Methods to Automatically Identify Students in Need of Assistance
ICER '15: Proceedings of the eleventh annual International Conference on International Computing Education ResearchMethods for automatically identifying students in need of assistance have been studied for decades. Initially, the work was based on somewhat static factors such as students' educational background and results from various questionnaires, while more ...
Comparing Programming Self-Esteem of Upper Secondary School Teachers to CS1 Students
ITiCSE '21: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1Teacher self-esteem has been found to impact student learning in a number of non-computing fields. As computing slowly becomes a part of the upper secondary school (high school) curriculum in many countries, instruments designed to measure teachers' ...
Gender Differences in Introductory Programming: Comparing MOOCs and Local Courses
SIGCSE '20: Proceedings of the 51st ACM Technical Symposium on Computer Science EducationWe analyzed three introductory programming MOOCs and four introductory programming courses offered locally in a Finnish university. The course has been offered in all instances with roughly the same content, barring adjustments based on course feedback. ...
Comments