skip to main content
10.1145/3173574.3173748acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Public Access

The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

Published:19 April 2018Publication History

ABSTRACT

Literate programming tools are used by millions of programmers today, and are intended to facilitate presenting data analyses in the form of a narrative. We interviewed 21 data scientists to study coding behaviors in a literate programming environment and how data scientists kept track of variants they explored. For participants who tried to keep a detailed history of their experimentation, both informal and formal versioning attempts led to problems, such as reduced notebook readability. During iteration, participants actively curated their notebooks into narratives, although primarily through cell structure rather than markdown explanations. Next, we surveyed 45 data scientists and asked them to envision how they might use their past history in an future version control system. Based on these results, we give design guidance for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.

Skip Supplemental Material Section

Supplemental Material

pn2142-file5.mp4

mp4

8.9 MB

pn2142.mp4

mp4

283.4 MB

References

  1. Hugh Beyer and Karen Holtzblatt. 1997. Contextual design: defining customer-centered systems. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Eric A. Bier, Maureen C. Stone, Ken Pier, William Buxton, and Tony D. DeRose. 1993. Toolglass and magic lenses: the see-through interface. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joel Brandt, Philip J. Guo, Joel Lewenstein, and Scott R. Klemmer. 2008. Opportunistic programming: How rapid ideation and prototyping occur in practice. In Proceedings of the 4th international workshop on End-user software engineering, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Juliet Corbin and Anselm Strauss. 1990. Grounded Theory Research: Procedures, Canons and Evaluative Criteria. Zeitschrift für Soziologie 19, 6: 515.Google ScholarGoogle Scholar
  5. The Sage Developers. SageMath, the Sage Mathematics Software System (Version x.y.z).Google ScholarGoogle Scholar
  6. Danyel Fisher, Badrish Chandramouli, Robert DeLine, Jonathan Goldstein, Andrei Aron, Mike Barnett, John C. Platt, James F. Terwilliger, and John Wernsing. 2014. Tempe: an interactive data science environment for exploration of temporal and streaming data. Tech. Rep. MSR-TR-2014--148.Google ScholarGoogle Scholar
  7. Apache Software Foundation. 2017. Apache Zeppelin 0.7.0. Retrieved from https://zeppelin.apache.org/Google ScholarGoogle Scholar
  8. Maik Riechert. 2016. Repairing Bad Pixels. Retrieved January 6, 2018 from https://github.com/letmaik/rawpynotebooks/blob/master/bad-pixel-repair/bad-pixelrepair.ipynbGoogle ScholarGoogle Scholar
  9. Philip Jia Guo. 2012. Software tools to facilitate research programming. Ph.D. Dissertation. Stanford University.Google ScholarGoogle Scholar
  10. Philip J. Guo and Margo I. Seltzer. 2012. Burrito: Wrapping your lab notebook in computational infrastructure. In Proceedings of the 4th USENIX Workshop on the Theory and Practice of Provenance. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Charles Hill, Rachel Bellamy, Thomas Erickson, and Margaret Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In Visual Languages and Human-Centric Computing (VL/HCC), 2016 IEEE Symposium on, 162--170.Google ScholarGoogle ScholarCross RefCross Ref
  12. Scott E. Hudson, Roy Rodenstein, and Ian Smith. 1997. Debugging Lenses: A New Class of Transparent Tools for User Interface Debugging. In Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology (UIST '97), 179--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Wolfram Research Inc. Mathematica, Version 11.2.Google ScholarGoogle Scholar
  14. Mary Beth Kery, Amber Horvath, and Brad A. Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '17), 1265--1276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Clemens Nylandsted Klokmose and Pär-Ola Zander. 2010. Rethinking Laboratory Notebooks. In Proceedings of COOP 2010. Springer, London, 119-- 139.Google ScholarGoogle ScholarCross RefCross Ref
  16. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E. Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B. Hamrick, Jason Grout, Sylvain Corlay, and Others. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. In ELPUB, 87--90.Google ScholarGoogle Scholar
  17. Donald Ervin Knuth. 1984. Literate programming. Computer Journal 27, 2: 97--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Timothy C. Lethbridge, Janice Singer, and Andrew Forward. 2003. How software engineers use documentation: The state of the practice. IEEE Software 20, 6: 35--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert C. Martin. 2009. Clean code: a handbook of agile software craftsmanship. Pearson Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gerard Oleksik, Natasa Milic-Frayling, and Rachel Jones. 2014. Study of Electronic Lab Notebook Design and Practices That Emerged in a Collaborative Scientific Environment. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '14), 120--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Lorge Parnas. 1994. Software aging. In Proceedings of the 16th international conference on Software engineering, 279--287.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kayur Patel. 2010. Lowering the barrier to applying machine learning. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology, 355--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 667--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fernando Pérez and Brian E. Granger. 2007. IPython: a System for Interactive Scientific Computing. Computing in Science and Engineering 9, 3: 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fernando Perez and Brian E. Granger. 2015. Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science. Project Jupyter Blog. Retrieved from http://blog.jupyter.org/2015/07/07/project-jupytercomputational-narratives-as-the-engine-ofcollaborative-data-science/Google ScholarGoogle Scholar
  26. Helen Shen. 2014. Interactive notebooks: Sharing the code. Nature 515, 7525: 151.Google ScholarGoogle Scholar
  27. Sruti Srinivasa Ragavan, Sandeep Kaur Kuttal, Charles Hill, Anita Sarma, David Piorkowski, and Margaret Burnett. 2016. Foraging Among an Overabundance of Similar Variants. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), 3509--3521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jean-Luc R. Stevens, Marco Elver, and James A. Bednar. 2013. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook. Frontiers in neuroinformatics 7.Google ScholarGoogle Scholar
  29. Aurélien Tabard, Wendy E. Mackay, and Evelyn Eastmond. 2008. From Individual to Collaborative: The Evolution of Prism, a Hybrid Laboratory Notebook. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW '08), 569--578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Greg Wilson. 2006. Software carpentry: getting scientists to write better code by making them more productive. Computing in science & engineering 8, 6: 66--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yihui Xie. 2014. knitr: a comprehensive tool for reproducible research in R. Implement Reprod Res 1: 20.Google ScholarGoogle Scholar
  32. Youngseok Yoon, Brad A. Myers, and Sebon Koo. 2013. Visualization of fine-grained code change history. In Visual Languages and Human-Centric Computing (VL/HCC), 2013 IEEE Symposium on, 119--126.Google ScholarGoogle ScholarCross RefCross Ref
  33. 12/2015. Jupyter Notebook 2015 UX Survey Results. Jupyter Project Github Repository. Retrieved from https://github.com/jupyter/surveys/blob/master/survey s/2015--12-notebookux/analysis/report_dashboard.ipynbGoogle ScholarGoogle Scholar
  34. 2013. Databricks. Retrieved from https://databricks.com/Google ScholarGoogle Scholar

Index Terms

  1. The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
        April 2018
        8489 pages
        ISBN:9781450356206
        DOI:10.1145/3173574

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 April 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CHI '18 Paper Acceptance Rate666of2,590submissions,26%Overall Acceptance Rate6,199of26,314submissions,24%

        Upcoming Conference

        CHI '24
        CHI Conference on Human Factors in Computing Systems
        May 11 - 16, 2024
        Honolulu , HI , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader