skip to main content
10.1145/2566486.2568023acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Codewebs: scalable homework search for massive open online programming courses

Published: 07 April 2014 Publication History

Abstract

Massive open online courses (MOOCs), one of the latest internet revolutions have engendered hope that constant iterative improvement and economies of scale may cure the ``cost disease" of higher education. While scalable in many ways, providing feedback for homework submissions (particularly open-ended ones) remains a challenge in the online classroom. In courses where the student-teacher ratio can be ten thousand to one or worse, it is impossible for instructors to personally give feedback to students or to understand the multitude of student approaches and pitfalls. Organizing and making sense of massive collections of homework solutions is thus a critical web problem. Despite the challenges, the dense solution space sampling in highly structured homeworks for some MOOCs suggests an elegant solution to providing quality feedback to students on a massive scale.
We outline a method for decomposing online homework submissions into a vocabulary of "code phrases", and based on this vocabulary, we architect a queryable index that allows for fast searches into the massive dataset of student homework submissions. To demonstrate the utility of our homework search engine we index over a million code submissions from users worldwide in Stanford's Machine Learning MOOC and (a) semi-automatically learn shared structure amongst homework submissions and (b) generate specific feedback for student mistakes.
Codewebs is a tool that leverages the redundancy of densely sampled, highly structured homeworks in order to force-multiply teacher effort. Giving articulate, instant feedback is a crucial component of the online learning process and thus by building a homework search engine we hope to take a step towards higher quality free education.

References

[1]
I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. Clone detection using abstract syntax trees. In Software Maintenance, 1998. Proceedings. International Conference on, pages 368--377. IEEE, 1998.
[2]
L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In INFOCOM'99. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, volume 1, pages 126--134. IEEE, 1999.
[3]
J. W. Eaton, D. Bateman, and S. Hauberg. Gnu octave. Free Software Foundation, 1997.
[4]
E. Fast, C. Lee, A. Aiken, M. Bernstein, D. Koller, and E. Smith. Crowd-scale interactive formal reasoning and analytics. In UIST: ACM Symposium on User Interface Software and Technology, 2013.
[5]
E. L. Glassman, N. Gulley, and R. C. Miller. Toward facilitating assistance to students attempting engineering design problems. In Proceedings of the ninth annual international ACM conference on International computing education research, pages 41--46. ACM, 2013.
[6]
S. Gross, B. Mokbel, B. Hammer, and N. Pinkwart. Towards providing feedback to students in absence of formalized domain models. In Artificial Intelligence in Education, pages 644--648. Springer, 2013.
[7]
S. Gross, X. Zhu, B. Hammer, and N. Pinkwart. Cluster based feedback provision strategies in intelligent tutoring systems. In Intelligent Tutoring Systems, pages 699--700. Springer, 2012.
[8]
B. Hartmann, D. MacDougall, J. Brandt, and S. R. Klemmer. What would other programmers do: suggesting solutions to error messages. In Proceedings of the 28th international conference on Human factors in computing systems, pages 1019--1028. ACM, 2010.
[9]
R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th annual ACM symposium on User interface software and technology, pages 13--22. ACM, 2007.
[10]
J. Huang, C. Piech, A. Nguyen, and L. Guibas. Syntactic and functional variability of a million code submissions in a machine learning mooc. In AIED 2013 Workshops Proceedings Volume, page 25, 2013.
[11]
O. Hummel, W. Janjic, and C. Atkinson. Code conjurer: Pulling reusable software out of thin air. Software, IEEE, 25(5):45--52, 2008.
[12]
J. Kim, S. Lee, S.-w. Hwang, and S. Kim. Towards an intelligent code search engine. In AAAI, 2010.
[13]
M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Artificial Intelligence and Symbolic Computation, pages 241--253. Springer, 2006.
[14]
O. A. Lazzarini Lemos, S. Bajracharya, J. Ossher, P. C. Masiero, and C. Lopes. Applying test-driven code search to the reuse of auxiliary functionality. In Proceedings of the 2009 ACM symposium on Applied Computing, pages 476--482. ACM, 2009.
[15]
L. Lu, Z.-K. Zhang, and T. Zhou. Zipf's law leads to heaps' law: Analyzing their relation in finite-size systems. PloS one, 5(12):e14139, 2010.
[16]
L. Pappano. The Year of the MOOC. New York Times, 2012.
[17]
S. Paul and A. Prakash. A framework for source code search using program patterns. Software Engineering, IEEE Transactions on, 20(6):463--475, 1994.
[18]
C. Piech, J. Huang, Z. Chen, C. Do, A. Ng, and D. Koller. Tuned models of peer assessment in MOOCs. In Proceedings of The 6th International Conference on Educational Data Mining (EDM 2013), 2013.
[19]
C. Piech, M. Sahami, D. Koller, S. Cooper, and P. Blikstein. Modeling how students learn to program. In Proceedings of the 43rd ACM technical symposium on Computer Science Education, pages 153--160. ACM, 2012.
[20]
K. Rivers and K. R. Koedinger. A canonicalizing model for building programming tutors. In Intelligent Tutoring Systems, pages 591--593. Springer, 2012.
[21]
K. Rivers and K. R. Koedinger. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), page 50, 2013.
[22]
D. Shasha, J. T.-L. Wang, K. Zhang, and F. Y. Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668--678, 1994.
[23]
R. Sindhgatta. Using an information retrieval system to retrieve source code samples. In Proceedings of the 28th international conference on Software engineering, pages 905--908. ACM, 2006.
[24]
S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 204--213. ACM, 2007.
[25]
S. Xu and Y. San Chee. Transformation-based diagnosis of student programs for programming tutoring systems. Software Engineering, IEEE Transactions on, 29(4):360--384, 2003.
[26]
G. K. Zipf. Human behavior and the principle of least effort. 1949.
[27]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys (CSUR), 38(2):6, 2006.

Cited By

View all
  • (2024)A Systematic Review of Application of Machine Learning in Curriculum Design Among Higher EducationJournal of Emerging Computer Technologies10.57020/ject.14755664:1(15-24)Online publication date: 31-Dec-2024
  • (2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
  • (2024)Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-JudgeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653612(52-58)Online publication date: 3-Jul-2024
  • Show More Cited By

Index Terms

  1. Codewebs: scalable homework search for massive open online programming courses

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '14: Proceedings of the 23rd international conference on World wide web
      April 2014
      926 pages
      ISBN:9781450327442
      DOI:10.1145/2566486

      Sponsors

      • IW3C2: International World Wide Web Conference Committee

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 April 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. AST
      2. MOOC
      3. abstract syntax tree
      4. canonicalization
      5. code search
      6. coursera
      7. education
      8. massive open online course
      9. octave
      10. semantic equivalence
      11. student feedback

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      WWW '14
      Sponsor:
      • IW3C2

      Acceptance Rates

      WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Systematic Review of Application of Machine Learning in Curriculum Design Among Higher EducationJournal of Emerging Computer Technologies10.57020/ject.14755664:1(15-24)Online publication date: 31-Dec-2024
      • (2024)CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at ScaleProceedings of the Eleventh ACM Conference on Learning @ Scale10.1145/3657604.3662025(188-199)Online publication date: 9-Jul-2024
      • (2024)Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-JudgeProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 110.1145/3649217.3653612(52-58)Online publication date: 3-Jul-2024
      • (2024)Auglets: Intelligent Tutors for Learning Good Coding Practices by Solving Refactoring ProblemsProceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 110.1145/3649165.3690119(95-101)Online publication date: 5-Dec-2024
      • (2024)Learning with Style: Improving Student Code-Style Through Better Automated FeedbackProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630889(1175-1181)Online publication date: 7-Mar-2024
      • (2024) Assessing student perceptions and use of instructor versus AI ‐generated feedback British Journal of Educational Technology10.1111/bjet.13558Online publication date: 27-Dec-2024
      • (2024)Flexible control flow graph alignment for delivering data-driven feedback to novice programming learnersJournal of Systems and Software10.1016/j.jss.2024.111960(111960)Online publication date: Jan-2024
      • (2023)Helping to provide adaptive feedback to novice programmers: a framework to assist the Teachers2023 18th Iberian Conference on Information Systems and Technologies (CISTI)10.23919/CISTI58278.2023.10212000(1-6)Online publication date: 20-Jun-2023
      • (2023)Evaluating the Quality of LLM-Generated Explanations for Logical Errors in CS1 Student ProgramsProceedings of the 16th Annual ACM India Compute Conference10.1145/3627217.3627233(49-54)Online publication date: 9-Dec-2023
      • (2023)The Student Zipf Theory: Inferring Latent Structures in Open-Ended Student Work To Help EducatorsLAK23: 13th International Learning Analytics and Knowledge Conference10.1145/3576050.3576116(464-475)Online publication date: 13-Mar-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media