research-article

Codewebs: scalable homework search for massive open online programming courses

Authors:

Christopher Piech,

Jonathan Huang,

Leonidas GuibasAuthors Info & Claims

WWW '14: Proceedings of the 23rd international conference on World wide web

Pages 491 - 502

https://doi.org/10.1145/2566486.2568023

Published: 07 April 2014 Publication History

Abstract

Massive open online courses (MOOCs), one of the latest internet revolutions have engendered hope that constant iterative improvement and economies of scale may cure the ``cost disease" of higher education. While scalable in many ways, providing feedback for homework submissions (particularly open-ended ones) remains a challenge in the online classroom. In courses where the student-teacher ratio can be ten thousand to one or worse, it is impossible for instructors to personally give feedback to students or to understand the multitude of student approaches and pitfalls. Organizing and making sense of massive collections of homework solutions is thus a critical web problem. Despite the challenges, the dense solution space sampling in highly structured homeworks for some MOOCs suggests an elegant solution to providing quality feedback to students on a massive scale.

We outline a method for decomposing online homework submissions into a vocabulary of "code phrases", and based on this vocabulary, we architect a queryable index that allows for fast searches into the massive dataset of student homework submissions. To demonstrate the utility of our homework search engine we index over a million code submissions from users worldwide in Stanford's Machine Learning MOOC and (a) semi-automatically learn shared structure amongst homework submissions and (b) generate specific feedback for student mistakes.

Codewebs is a tool that leverages the redundancy of densely sampled, highly structured homeworks in order to force-multiply teacher effort. Giving articulate, instant feedback is a crucial component of the online learning process and thus by building a homework search engine we hope to take a step towards higher quality free education.

References

[1]

I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. Clone detection using abstract syntax trees. In Software Maintenance, 1998. Proceedings. International Conference on, pages 368--377. IEEE, 1998.

Digital Library

[2]

L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In INFOCOM'99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 1, pages 126--134. IEEE, 1999.

[3]

J. W. Eaton, D. Bateman, and S. Hauberg. Gnu octave. Free Software Foundation, 1997.

[4]

E. Fast, C. Lee, A. Aiken, M. Bernstein, D. Koller, and E. Smith. Crowd-scale interactive formal reasoning and analytics. In UIST: ACM Symposium on User Interface Software and Technology, 2013.

Digital Library

[5]

E. L. Glassman, N. Gulley, and R. C. Miller. Toward facilitating assistance to students attempting engineering design problems. In Proceedings of the ninth annual international ACM conference on International computing education research, pages 41--46. ACM, 2013.

Digital Library

[6]

S. Gross, B. Mokbel, B. Hammer, and N. Pinkwart. Towards providing feedback to students in absence of formalized domain models. In Artificial Intelligence in Education, pages 644--648. Springer, 2013.

[7]

S. Gross, X. Zhu, B. Hammer, and N. Pinkwart. Cluster based feedback provision strategies in intelligent tutoring systems. In Intelligent Tutoring Systems, pages 699--700. Springer, 2012.

Digital Library

[8]

B. Hartmann, D. MacDougall, J. Brandt, and S. R. Klemmer. What would other programmers do: suggesting solutions to error messages. In Proceedings of the 28th international conference on Human factors in computing systems, pages 1019--1028. ACM, 2010.

Digital Library

[9]

R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th annual ACM symposium on User interface software and technology, pages 13--22. ACM, 2007.

Digital Library

[10]

J. Huang, C. Piech, A. Nguyen, and L. Guibas. Syntactic and functional variability of a million code submissions in a machine learning mooc. In AIED 2013 Workshops Proceedings Volume, page 25, 2013.

[11]

O. Hummel, W. Janjic, and C. Atkinson. Code conjurer: Pulling reusable software out of thin air. Software, IEEE, 25(5):45--52, 2008.

Digital Library

[12]

J. Kim, S. Lee, S.-w. Hwang, and S. Kim. Towards an intelligent code search engine. In AAAI, 2010.

[13]

M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Artificial Intelligence and Symbolic Computation, pages 241--253. Springer, 2006.

Digital Library

[14]

O. A. Lazzarini Lemos, S. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes. Applying test-driven code search to the reuse of auxiliary functionality. In Proceedings of the 2009 ACM symposium on Applied Computing, pages 476--482. ACM, 2009.

Digital Library

[15]

L. Lu, Z.-K. Zhang, and T. Zhou. Zipf's law leads to heaps' law: Analyzing their relation in finite-size systems. PloS one, 5(12):e14139, 2010.

[16]

L. Pappano. The Year of the MOOC. New York Times, 2012.

[17]

S. Paul and A. Prakash. A framework for source code search using program patterns. Software Engineering, IEEE Transactions on, 20(6):463--475, 1994.

Digital Library

[18]

C. Piech, J. Huang, Z. Chen, C. Do, A. Ng, and D. Koller. Tuned models of peer assessment in MOOCs. In Proceedings of The 6th International Conference on Educational Data Mining (EDM 2013), 2013.

[19]

C. Piech, M. Sahami, D. Koller, S. Cooper, and P. Blikstein. Modeling how students learn to program. In Proceedings of the 43rd ACM technical symposium on Computer Science Education, pages 153--160. ACM, 2012.

Digital Library

[20]

K. Rivers and K. R. Koedinger. A canonicalizing model for building programming tutors. In Intelligent Tutoring Systems, pages 591--593. Springer, 2012.

Digital Library

[21]

K. Rivers and K. R. Koedinger. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), page 50, 2013.

[22]

D. Shasha, J. T.-L. Wang, K. Zhang, and F. Y. Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668--678, 1994.

[23]

R. Sindhgatta. Using an information retrieval system to retrieve source code samples. In Proceedings of the 28th international conference on Software engineering, pages 905--908. ACM, 2006.

Digital Library

[24]

S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 204--213. ACM, 2007.

Digital Library

[25]

S. Xu and Y. San Chee. Transformation-based diagnosis of student programs for programming tutoring systems. Software Engineering, IEEE Transactions on, 29(4):360--384, 2003.

Digital Library

[26]

G. K. Zipf. Human behavior and the principle of least effort. 1949.

[27]

J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys (CSUR), 38(2):6, 2006.

Digital Library

Cited By

Deng Y(2024)A Systematic Review of Application of Machine Learning in Curriculum Design Among Higher EducationJournal of Emerging Computer Technologies10.57020/ject.14755664:1(15-24)Online publication date: 31-Dec-2024
https://doi.org/10.57020/ject.1475566
Zhang ATang XOney SChen YJoyner DKim MWang XXia M(2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662025
Koutcheme CDainese NSarsa SHellas ALeinonen JDenny PMonga MLonati VBarendsen ESheard JPaterson J(2024)Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-JudgeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653612(52-58)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653612
Show More Cited By

Index Terms

Codewebs: scalable homework search for massive open online programming courses
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
2. Information systems

Recommendations

Superposter behavior in MOOC forums
L@S '14: Proceedings of the first ACM conference on Learning @ scale conference

Discussion forums, employed by MOOC providers as the primary mode of interaction among instructors and students, have emerged as one of the important components of online courses. We empirically study contribution behavior in these online collaborative ...
Problem-Based Learning in a MOOC
CSEDU 2016: Proceedings of the 8th International Conference on Computer Supported Education

This paper describes a MOOC about PBL which is designed â as far as possible in the setting of a MOOC- in

line with modern learning principles that are also at the basis of PBL: constructive, contextual,

collaborative and self-directed learning: Problem-...
On the path to self-determined learning: a mixed methods study of learners' attributes and strategies to learn in language MOOCs

In this study, we employ heutagogy (self-determined learning) to learn about autonomous learning characteristics of language MOOC learners using an embedded correlational mixed methods design. We administered quantitative and qualitative questionnaires to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '14: Proceedings of the 23rd international conference on World wide web

April 2014

926 pages

ISBN:9781450327442

DOI:10.1145/2566486

General Chair:
Chin-Wan Chung
Korea Advanced Institute of Science and Technology, Korea
,
Program Chairs:
Andrei Broder
Google Inc., USA
,
Kyuseok Shim
Seoul National University, Korea
,
Torsten Suel
New York University, USA

Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

IW3C2: International World Wide Web Conference Committee

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Computing Research Association
Google
Max Planck Center for Visual Computing and Communications
Air Force Office of Scientific Research
Division of Mathematical Sciences
Division of Computing and Communication Foundations
National Science Foundation

Conference

WWW '14

Sponsor:

IW3C2

WWW '14: 23rd International World Wide Web Conference

April 7 - 11, 2014

Seoul, Korea

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
758
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)8

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Deng Y(2024)A Systematic Review of Application of Machine Learning in Curriculum Design Among Higher EducationJournal of Emerging Computer Technologies10.57020/ject.14755664:1(15-24)Online publication date: 31-Dec-2024
https://doi.org/10.57020/ject.1475566
Zhang ATang XOney SChen YJoyner DKim MWang XXia M(2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1145/3657604.3662025
Koutcheme CDainese NSarsa SHellas ALeinonen JDenny PMonga MLonati VBarendsen ESheard JPaterson J(2024)Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-JudgeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653612(52-58)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1145/3649217.3653612
Kumar ADorodchi MZhange MCooper S(2024)Auglets: Intelligent Tutors for Learning Good Coding Practices by Solving Refactoring ProblemsProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 110.1145/3649165.3690119(95-101)Online publication date: 5-Dec-2024
https://dl.acm.org/doi/10.1145/3649165.3690119
Saliba LShioji EOliveira ECohney SQi JStephenson BStone JBattestilli LRebelsky SShoop L(2024)Learning with Style: Improving Student Code-Style Through Better Automated FeedbackProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630889(1175-1181)Online publication date: 7-Mar-2024
https://dl.acm.org/doi/10.1145/3626252.3630889
Er EAkçapınar GBayazıt ANoroozi OBanihashem S(2024) Assessing student perceptions and use of instructor versus AI ‐generated feedback British Journal of Educational Technology10.1111/bjet.13558Online publication date: 27-Dec-2024
https://doi.org/10.1111/bjet.13558
Chowdhury MContractor MRivero C(2024)Flexible control flow graph alignment for delivering data-driven feedback to novice programming learnersJournal of Systems and Software10.1016/j.jss.2024.111960(111960)Online publication date: Jan-2024
https://doi.org/10.1016/j.jss.2024.111960
Rocha HCosta ETedesco P(2023)Helping to provide adaptive feedback to novice programmers: a framework to assist the Teachers2023 18th Iberian Conference on Information Systems and Technologies (CISTI)10.23919/CISTI58278.2023.10212000(1-6)Online publication date: 20-Jun-2023
https://doi.org/10.23919/CISTI58278.2023.10212000
Balse RKumar VPrasad PWarriem J(2023)Evaluating the Quality of LLM-Generated Explanations for Logical Errors in CS1 Student ProgramsProceedings of the 16th Annual ACM India Compute Conference10.1145/3627217.3627233(49-54)Online publication date: 9-Dec-2023
https://dl.acm.org/doi/10.1145/3627217.3627233
Kim YPiech CHilliger IKhosravi HRienties BDawson S(2023)The Student Zipf Theory: Inferring Latent Structures in Open-Ended Student Work To Help EducatorsLAK23: 13th International Learning Analytics and Knowledge Conference10.1145/3576050.3576116(464-475)Online publication date: 13-Mar-2023
https://dl.acm.org/doi/10.1145/3576050.3576116
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten