ABSTRACT
Computer systems research spans sub-disciplines that include embedded and real-time systems, compilers, networking, and operating systems. Our contention is that a number of structural factors inhibit quality research and decrease the velocity of science. We highlight some of the factors we have encountered in our work and observed in published papers and propose solutions that, if widely adopted, could both increase the productivity of researchers and the quality of their output.
- Evaluate collaboratory: Experimental evaluation of software and systems in computer science. http://evaluate.inf.usi.ch/, 2011.Google Scholar
- Reproducible research planet. http://www.rrplanet.com/, 2011.Google Scholar
- K. Baggerly and K. Coombes. Deriving chemo sensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics, 2008.Google Scholar
- S. Blackburn, R. Garner, K. S. McKinley, A. Diwan, S. Z. Guyer, A. Hosking, J. E. B. Moss, D. Stefanović, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), 2006. Google ScholarDigital Library
- B. Clark, T. Deshane, E. Dow, S. Evanchik, M. Finlayson, J. Herne, and J. N. Matthews. Xen and the art of repeated research. In USENIX Annual Technical Conference, 2004. Google ScholarDigital Library
- A. C. Davison and D. V. Hinkley. Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge, UK, 1997.Google ScholarCross Ref
- A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), 2007. Google ScholarDigital Library
- A. Georges, L. Eeckhout, and D. Buytaert. Java performance evaluation through rigorous replay compilation. In Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA), 2008. Google ScholarDigital Library
- D. Gu, C. Verbrugge, and E. Gagnon. Code layout as a source of noise in JVM performance. In Component And Middleware Performance Workshop, OOPSLA, 2004.Google Scholar
- R. Jain. The Art of Computer Systems Performance Analysis. John Wiley & Sons, 1991.Google Scholar
- T. Kalibera, L. Bulej, and P. Tuma. Automated detection of performance regressions: The Mono experience. In Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2005. Google ScholarDigital Library
- T. Kalibera, J. Hagelberg, P. Maj, F. Pizlo, B. Titzer, and J. Vitek. A family of real-time Java benchmarks. Concurrency and Computation: Practice and Experience, 2011. Google ScholarDigital Library
- T. Kalibera and P. Tuma. Precise regression benchmarking with random effects: Improving Mono benchmark results. In Formal Methods and Stochastic Models for Performance Evaluation, Third European Performance Engineering Workshop (EPEW), 2006. Google ScholarDigital Library
- L. Kirkup. Experimental Methods: An Introduction to the Analysis and Presentation of Data. Wiley, 1994.Google Scholar
- D. J. Lilja. Measuring Computer Performance: A Practitioner's Guide. Cambridge University Press, 2000. Google ScholarDigital Library
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarDigital Library
- G. Richards, A. Gal, B. Eich, and J. Vitek. Automated construction of JavaScript benchmarks. In Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), 2011. Google ScholarDigital Library
- B. N. Taylor and C. E. Kuyatt. Guidelines for evaluating and expressing the uncertainty of nist measurement results. NIST Technical Note 1297, National Institute of Standards and Technology, 1994.Google Scholar
- R. Wieringa, H. Heerkens, and B. Regnell. How to read and write a scientific evaluation paper. In IEEE International Requirements Engineering Conference, 2009. Google ScholarDigital Library
- E. B. Wilson. An Introduction to Scientific Research. McGraw Hill, 1952.Google Scholar
Index Terms
- Repeatability, reproducibility, and rigor in systems research
Recommendations
KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and ReplicabilityDistributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to ...
Towards reproducibility in recommender-systems research
Numerous recommendation approaches are in use today. However, comparing their effectiveness is a challenging task because evaluation results are rarely reproducible. In this article, we examine the challenge of reproducibility in recommender-system ...
A Database for Reproducible Computational Research
WSCAD-SSC '12: Proceedings of the 2012 13th Symposium on Computing SystemsThe ability to reproduce the experiments of a scientific research is one of the fundamental principles of the scientific method, as failing to reproduce it and obtain results equal (or very similar) to the original might imply that the later were wrong. ...
Comments