research-article

Public Access

The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

Authors:
Mary Beth Kery

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Marissa Radensky

Amherst College, Amherst, MA, USA

Amherst College, Amherst, MA, USA
View Profile

,
Mahima Arya

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Bonnie E. John

Bloomberg LP, New York City, NY, USA

Bloomberg LP, New York City, NY, USA
View Profile

,
Brad A. Myers

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing SystemsApril 2018Paper No.: 174Pages 1–11https://doi.org/10.1145/3173574.3173748

Published:19 April 2018Publication History

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

Pages 1–11

ABSTRACT

Literate programming tools are used by millions of programmers today, and are intended to facilitate presenting data analyses in the form of a narrative. We interviewed 21 data scientists to study coding behaviors in a literate programming environment and how data scientists kept track of variants they explored. For participants who tried to keep a detailed history of their experimentation, both informal and formal versioning attempts led to problems, such as reduced notebook readability. During iteration, participants actively curated their notebooks into narratives, although primarily through cell structure rather than markdown explanations. Next, we surveyed 45 data scientists and asked them to envision how they might use their past history in an future version control system. Based on these results, we give design guidance for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.

Supplemental Material

pn2142-file5.mp4

mp4

8.9 MB

Download

pn2142.mp4

mp4

283.4 MB

Download

References

Hugh Beyer and Karen Holtzblatt. 1997. Contextual design: defining customer-centered systems. Elsevier. Google ScholarDigital Library
Eric A. Bier, Maureen C. Stone, Ken Pier, William Buxton, and Tony D. DeRose. 1993. Toolglass and magic lenses: the see-through interface. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 73--80. Google ScholarDigital Library
Joel Brandt, Philip J. Guo, Joel Lewenstein, and Scott R. Klemmer. 2008. Opportunistic programming: How rapid ideation and prototyping occur in practice. In Proceedings of the 4th international workshop on End-user software engineering, 1--5. Google ScholarDigital Library
Juliet Corbin and Anselm Strauss. 1990. Grounded Theory Research: Procedures, Canons and Evaluative Criteria. Zeitschrift für Soziologie 19, 6: 515.Google Scholar
The Sage Developers. SageMath, the Sage Mathematics Software System (Version x.y.z).Google Scholar
Danyel Fisher, Badrish Chandramouli, Robert DeLine, Jonathan Goldstein, Andrei Aron, Mike Barnett, John C. Platt, James F. Terwilliger, and John Wernsing. 2014. Tempe: an interactive data science environment for exploration of temporal and streaming data. Tech. Rep. MSR-TR-2014--148.Google Scholar
Apache Software Foundation. 2017. Apache Zeppelin 0.7.0. Retrieved from https://zeppelin.apache.org/Google Scholar
Maik Riechert. 2016. Repairing Bad Pixels. Retrieved January 6, 2018 from https://github.com/letmaik/rawpynotebooks/blob/master/bad-pixel-repair/bad-pixelrepair.ipynbGoogle Scholar
Philip Jia Guo. 2012. Software tools to facilitate research programming. Ph.D. Dissertation. Stanford University.Google Scholar
Philip J. Guo and Margo I. Seltzer. 2012. Burrito: Wrapping your lab notebook in computational infrastructure. In Proceedings of the 4th USENIX Workshop on the Theory and Practice of Provenance. Google ScholarDigital Library
Charles Hill, Rachel Bellamy, Thomas Erickson, and Margaret Burnett. 2016. Trials and tribulations of developers of intelligent systems: A field study. In Visual Languages and Human-Centric Computing (VL/HCC), 2016 IEEE Symposium on, 162--170.Google ScholarCross Ref
Scott E. Hudson, Roy Rodenstein, and Ian Smith. 1997. Debugging Lenses: A New Class of Transparent Tools for User Interface Debugging. In Proceedings of the 10th Annual ACM Symposium on User Interface Software and Technology (UIST '97), 179--187. Google ScholarDigital Library
Wolfram Research Inc. Mathematica, Version 11.2.Google Scholar
Mary Beth Kery, Amber Horvath, and Brad A. Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '17), 1265--1276. Google ScholarDigital Library
Clemens Nylandsted Klokmose and Pär-Ola Zander. 2010. Rethinking Laboratory Notebooks. In Proceedings of COOP 2010. Springer, London, 119-- 139.Google ScholarCross Ref
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E. Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B. Hamrick, Jason Grout, Sylvain Corlay, and Others. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. In ELPUB, 87--90.Google Scholar
Donald Ervin Knuth. 1984. Literate programming. Computer Journal 27, 2: 97--111. Google ScholarDigital Library
Timothy C. Lethbridge, Janice Singer, and Andrew Forward. 2003. How software engineers use documentation: The state of the practice. IEEE Software 20, 6: 35--39. Google ScholarDigital Library
Robert C. Martin. 2009. Clean code: a handbook of agile software craftsmanship. Pearson Education. Google ScholarDigital Library
Gerard Oleksik, Natasa Milic-Frayling, and Rachel Jones. 2014. Study of Electronic Lab Notebook Design and Practices That Emerged in a Collaborative Scientific Environment. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '14), 120--133. Google ScholarDigital Library
David Lorge Parnas. 1994. Software aging. In Proceedings of the 16th international conference on Software engineering, 279--287.Google ScholarDigital Library
Kayur Patel. 2010. Lowering the barrier to applying machine learning. In Adjunct proceedings of the 23nd annual ACM symposium on User interface software and technology, 355--358. Google ScholarDigital Library
Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 667--676. Google ScholarDigital Library
Fernando Pérez and Brian E. Granger. 2007. IPython: a System for Interactive Scientific Computing. Computing in Science and Engineering 9, 3: 21--29. Google ScholarDigital Library
Fernando Perez and Brian E. Granger. 2015. Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science. Project Jupyter Blog. Retrieved from http://blog.jupyter.org/2015/07/07/project-jupytercomputational-narratives-as-the-engine-ofcollaborative-data-science/Google Scholar
Helen Shen. 2014. Interactive notebooks: Sharing the code. Nature 515, 7525: 151.Google Scholar
Sruti Srinivasa Ragavan, Sandeep Kaur Kuttal, Charles Hill, Anita Sarma, David Piorkowski, and Margaret Burnett. 2016. Foraging Among an Overabundance of Similar Variants. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16), 3509--3521. Google ScholarDigital Library
Jean-Luc R. Stevens, Marco Elver, and James A. Bednar. 2013. An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook. Frontiers in neuroinformatics 7.Google Scholar
Aurélien Tabard, Wendy E. Mackay, and Evelyn Eastmond. 2008. From Individual to Collaborative: The Evolution of Prism, a Hybrid Laboratory Notebook. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW '08), 569--578. Google ScholarDigital Library
Greg Wilson. 2006. Software carpentry: getting scientists to write better code by making them more productive. Computing in science & engineering 8, 6: 66--69. Google ScholarDigital Library
Yihui Xie. 2014. knitr: a comprehensive tool for reproducible research in R. Implement Reprod Res 1: 20.Google Scholar
Youngseok Yoon, Brad A. Myers, and Sebon Koo. 2013. Visualization of fine-grained code change history. In Visual Languages and Human-Centric Computing (VL/HCC), 2013 IEEE Symposium on, 119--126.Google ScholarCross Ref
12/2015. Jupyter Notebook 2015 UX Survey Results. Jupyter Project Github Repository. Retrieved from https://github.com/jupyter/surveys/blob/master/survey s/2015--12-notebookux/analysis/report_dashboard.ipynbGoogle Scholar
2013. Databricks. Retrieved from https://databricks.com/Google Scholar

Index Terms

The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments

Recommendations

Towards Effective Foraging by Data Scientists to Find Past Analysis Choices
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Data scientists are responsible for the analysis decisions they make, but it is hard for them to track the process by which they achieved a result. Even when data scientists keep logs, it is onerous to make sense of the resulting large number of history ...
Read More
IUI4EUD: intelligent user interfaces for end-user development
IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

End-User Developers program to meet some goal other than the code itself. This includes scientists, data analysts, and the general public when they write code. We have been working for many years on various ways to make end-user development more ...
Read More
Reimagining literate programming
OOPSLA '09: Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications

In this paper we describe Ginger, a new language with first class support for literate programming. Literate programming is a philosophy that argues computer programs should be written as literature with human readability and understanding of paramount ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems
April 2018
8489 pages
ISBN:9781450356206
DOI:10.1145/3173574
General Chairs:
Regan Mandryk
University of Saskatchewan, Canada
,
Mark Hancock
University of Waterloo, Canada
,
Program Chairs:
Mark Perry
Brunel University London, UK
,
Anna Cox
University College London, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data science
end-user programmers (eup)
end-user software engineering (euse)
exploratory programming
literate programming
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '18 Paper Acceptance Rate666of2,590submissions,26%Overall Acceptance Rate6,199of26,314submissions,24%
More
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 122
  Total Citations
  View Citations
- 2,912
  Total Downloads
- Downloads (Last 12 months)567
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

CHI '18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Towards Effective Foraging by Data Scientists to Find Past Analysis Choices

IUI4EUD: intelligent user interfaces for end-user development

Reimagining literate programming