skip to main content
research-article

An Effective Git And Org-Mode Based Workflow For Reproducible Research

Published: 20 January 2015 Publication History

Abstract

In this paper we address the question of developing a lightweight and effective workflow for conducting experimental research on modern parallel computer systems in a reproducible way. Our approach builds on two well-known tools (Git and Org-mode) and enables to address, at least partially, issues such as running experiments, provenance tracking, experimental setup reconstruction or replicable analysis. We have been using such a methodology for two years now and it enabled us to recently publish a fully reproducible article [12]. To fully demonstrate the effectiveness of our proposal, we have opened our two year laboratory notebook with all the attached experimental data. This notebook and the underlying Git revision control system enable to illustrate and to better understand the workflow we used.

References

[1]
C. Augonnet, S. Thibault, R. Namyst, and P.-A.Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, 23:187--198, Feb. 2011.
[2]
T. Buchert, L. Nussbaum, and J. Gustedt. A workflowinspired, modular and robust approach to experiments in distributed systems. Research Report RR-8404, INRIA, Nov. 2013.
[3]
H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter. Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms. Journal of Parallel and Distributed Computing, 74(10):2899--2917, June 2014.
[4]
C. Drummond. Replicability is not reproducibility: Nor is it good science. In Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML, 2009.
[5]
K. Hinsen. A data and code model for reproducible research and executable papers. Procedia Computer Science, 4(0):579--588, 2011. Proceedings of the International Conference on Computational Science.
[6]
M. Imbert, L. Pouilloux, J. Rouzaud-Cornabas, A. Lèbre, and T. Hirofuchi. Using the EXECO toolbox to perform automatic and reproducible cloud experiments. In 1st International Workshop on UsiNg and building ClOud Testbeds (UNICO, collocated with IEEE CloudCom 2013), Sept. 2013.
[7]
J. Mirkovic, T. B. S. Schwab, J. Wroclawski, T. Faber, and B. Braden. The DETER Project: Advancing the Science of Cyber Security Experimentation and Test. In Proceedings of the IEEE Homeland Security Technologies Conference (IEEE HST), 2010.
[8]
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages 265--276. ACM, 2009.
[9]
C. Ruiz, O. Richard, and J. Emeras. Reproducible software appliances for experimentation. In Proceedings of the 9th International ICST Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (Tridentcom), 2014.
[10]
C. C. Ruiz Sanabria, O. Richard, B. Videau, and I. Oleg. Managing large scale experiments in distributed testbeds. In Proceedings of the 11th IASTED International Conference. ACTA Press, 2013.
[11]
E. Schulte, D. Davison, T. Dye, and C. Dominik. A multi-language computing environment for literate programming and reproducible research. Journal of Statistical Software, 46(3):1--24, 1 2012.
[12]
L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J.-F. Méhaut. Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi- Core Architectures. In Proceedings of the 20th Euro-Par Conference. Springer-Verlag, Aug. 2014.
[13]
L. Stanisic, B. Videau, J. Cronsioe, A. Degomme, V. Marangozova-Martin, A. Legrand, and J.-F. Méhaut. Performance analysis of hpc applications on low-power embedded platforms. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '13, pages 475--480. EDA Consortium, 2013.
[14]
V. Stodden, F. Leisch, and R. D. Peng, editors. Implementing Reproducible Research. The R Series. Chapman and Hall/CRC, Apr. 2014.
[15]
Companion of the StarPU+SimGrid article. Hosted on Figshare: http://dx.doi.org/10.6084/m9.figshare. 928338, 2014. Online version of {12} with access to the experimental data and scripts (in the org source).

Cited By

View all
  • (2025)GitHub enables collaborative and reproducible laboratory researchPLOS Biology10.1371/journal.pbio.300302923:2(e3003029)Online publication date: 14-Feb-2025
  • (2024)Implementing Data Workflows and Data Model Extensions with RDF-starThe Electronic Library10.1108/EL-04-2023-010242:3(393-412)Online publication date: 15-Mar-2024
  • (2024)Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computingComputer Science Review10.1016/j.cosrev.2024.10065553(100655)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 49, Issue 1
Special Issue on Repeatability and Sharing of Experimental Artifacts
January 2015
155 pages
ISSN:0163-5980
DOI:10.1145/2723872
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2015
Published in SIGOPS Volume 49, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)GitHub enables collaborative and reproducible laboratory researchPLOS Biology10.1371/journal.pbio.300302923:2(e3003029)Online publication date: 14-Feb-2025
  • (2024)Implementing Data Workflows and Data Model Extensions with RDF-starThe Electronic Library10.1108/EL-04-2023-010242:3(393-412)Online publication date: 15-Mar-2024
  • (2024)Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computingComputer Science Review10.1016/j.cosrev.2024.10065553(100655)Online publication date: Aug-2024
  • (2023)Simplifying LaTeX with ORG-mode in EmacsTUGboat10.47397/tb/44-3/tb138somma-orgmode44:3(406-420)Online publication date: 2023
  • (2020)Performance and Cost-aware HPC in Clouds: A Network Interconnection Assessment2020 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC50000.2020.9219554(1-6)Online publication date: Jul-2020
  • (2020)Performance Impact of IEEE 802.3ad in Container-Based Clouds for HPC ApplicationsComputational Science and Its Applications – ICCSA 202010.1007/978-3-030-58817-5_13(158-167)Online publication date: 1-Jul-2020
  • (2019)Towards Responsible Design with Internet of Things DataProceedings of the Design Society: International Conference on Engineering Design10.1017/dsi.2019.3491:1(3421-3430)Online publication date: 26-Jul-2019
  • (2018)Credibility, Replicability, and Reproducibility in Simulation for Biomedicine and Clinical Applications in NeuroscienceFrontiers in Neuroinformatics10.3389/fninf.2018.0001812Online publication date: 16-Apr-2018
  • (2018)Reproducibility in Scientific ComputingACM Computing Surveys10.1145/318626651:3(1-36)Online publication date: 16-Jul-2018
  • (2018)Performance modeling of a geophysics application to accelerate over‐decomposition parameter tuning through simulationConcurrency and Computation: Practice and Experience10.1002/cpe.501231:11Online publication date: 15-Oct-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media