research-article

Repeatability, reproducibility, and rigor in systems research

Authors:
Jan Vitek

Purdue University, West Lafayette, IN, USA

Purdue University, West Lafayette, IN, USA
View Profile

,
Tomas Kalibera

University of Kent, Kent, United Kingdom

University of Kent, Kent, United Kingdom
View Profile

EMSOFT '11: Proceedings of the ninth ACM international conference on Embedded softwareOctober 2011Pages 33–38https://doi.org/10.1145/2038642.2038650

Published:09 October 2011Publication History

EMSOFT '11: Proceedings of the ninth ACM international conference on Embedded software

Pages 33–38

ABSTRACT

Computer systems research spans sub-disciplines that include embedded and real-time systems, compilers, networking, and operating systems. Our contention is that a number of structural factors inhibit quality research and decrease the velocity of science. We highlight some of the factors we have encountered in our work and observed in published papers and propose solutions that, if widely adopted, could both increase the productivity of researchers and the quality of their output.

References

Evaluate collaboratory: Experimental evaluation of software and systems in computer science. http://evaluate.inf.usi.ch/, 2011.Google Scholar
Reproducible research planet. http://www.rrplanet.com/, 2011.Google Scholar
K. Baggerly and K. Coombes. Deriving chemo sensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics, 2008.Google Scholar
S. Blackburn, R. Garner, K. S. McKinley, A. Diwan, S. Z. Guyer, A. Hosking, J. E. B. Moss, D. Stefanović, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), 2006. Google ScholarDigital Library
B. Clark, T. Deshane, E. Dow, S. Evanchik, M. Finlayson, J. Herne, and J. N. Matthews. Xen and the art of repeated research. In USENIX Annual Technical Conference, 2004. Google ScholarDigital Library
A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge, UK, 1997.Google ScholarCross Ref
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), 2007. Google ScholarDigital Library
A. Georges, L. Eeckhout, and D. Buytaert. Java performance evaluation through rigorous replay compilation. In Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), 2008. Google ScholarDigital Library
D. Gu, C. Verbrugge, and E. Gagnon. Code layout as a source of noise in JVM performance. In Component And Middleware Performance Workshop, OOPSLA, 2004.Google Scholar
R. Jain. The Art of Computer Systems Performance Analysis. John Wiley & Sons, 1991.Google Scholar
T. Kalibera, L. Bulej, and P. Tuma. Automated detection of performance regressions: The Mono experience. In Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2005. Google ScholarDigital Library
T. Kalibera, J. Hagelberg, P. Maj, F. Pizlo, B. Titzer, and J. Vitek. A family of real-time Java benchmarks. Concurrency and Computation: Practice and Experience, 2011. Google ScholarDigital Library
T. Kalibera and P. Tuma. Precise regression benchmarking with random effects: Improving Mono benchmark results. In Formal Methods and Stochastic Models for Performance Evaluation, Third European Performance Engineering Workshop (EPEW), 2006. Google ScholarDigital Library
L. Kirkup. Experimental Methods: An Introduction to the Analysis and Presentation of Data. Wiley, 1994.Google Scholar
D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000. Google ScholarDigital Library
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarDigital Library
G. Richards, A. Gal, B. Eich, and J. Vitek. Automated construction of JavaScript benchmarks. In Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), 2011. Google ScholarDigital Library
B. N. Taylor and C. E. Kuyatt. Guidelines for evaluating and expressing the uncertainty of nist measurement results. NIST Technical Note 1297, National Institute of Standards and Technology, 1994.Google Scholar
R. Wieringa, H. Heerkens, and B. Regnell. How to read and write a scientific evaluation paper. In IEEE International Requirements Engineering Conference, 2009. Google ScholarDigital Library
E. B. Wilson. An Introduction to Scientific Research. McGraw Hill, 1952.Google Scholar

Index Terms

Repeatability, reproducibility, and rigor in systems research
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics
2. Social and professional topics
  1. Professional topics
    1. Computing profession
      1. Codes of ethics

Recommendations

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and Replicability

Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to ...
Read More
Towards reproducibility in recommender-systems research

Numerous recommendation approaches are in use today. However, comparing their effectiveness is a challenging task because evaluation results are rarely reproducible. In this article, we examine the challenge of reproducibility in recommender-system ...
Read More
A Database for Reproducible Computational Research
WSCAD-SSC '12: Proceedings of the 2012 13th Symposium on Computing Systems

The ability to reproduce the experiments of a scientific research is one of the fundamental principles of the scientific method, as failing to reproduce it and obtain results equal (or very similar) to the original might imply that the later were wrong. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMSOFT '11: Proceedings of the ninth ACM international conference on Embedded software
October 2011
366 pages
ISBN:9781450307147
DOI:10.1145/2038642
General Chairs:
Samarjit Chakraborty
TU Munich
,
Ahmed Jerraya
CEA
,
Program Chairs:
Sanjoy Baruah
University of North Carolina at Chapel Hill
,
Sebastian Fischmeister
University of Waterloo
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
repeatability
reproducibility
scientific method
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate60of203submissions,30%
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 41
  Total Citations
  View Citations
- 489
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Repeatability, reproducibility, and rigor in systems research

EMSOFT '11: Proceedings of the ninth ACM international conference on Embedded software

ABSTRACT

References

Cited By

Index Terms

Recommendations

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

Towards reproducibility in recommender-systems research

A Database for Reproducible Computational Research