ABSTRACT
Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with the given time pressure of producing results. So how can one trust any reported empirical analysis of a new idea or concept in computer science?
This paper introduces DataMill, a community-based easy-to-use services-oriented open benchmarking infrastructure for performance evaluation. DataMill facilitates producing robust, reliable, and reproducible results. The infrastructure incorporates the latest results on hidden factors and automates the variation of these factors. Multiple research groups already participate in DataMill.
DataMill is also of interest for research on performance evaluation. The infrastructure supports quantifying the effect of hidden factors, disseminating the research results beyond mere reporting. It provides a platform for investigating interactions and composition of hidden factors.
Supplemental Material
Available for Download
This data archive is an accompaniment to the ICPE'13 paper titled: DataMill: Rigorous Performance Evaluation Made Easy by Augusto Born de Oliveira, Jean-Christophe Petkovich, Thomas Reidemeister, Sebastian Fischmeister. Any questions about the contents of this file may be directed to [email protected].
- Amazon Web Services LLC. Amazon Web Services. http://aws.amazon.com/. Accessed Sep. 17th, 2012.Google Scholar
- J. Antony. Design of Experiments for Engineers and Scientists. Butterworth-Heinemann, 2003.Google Scholar
- Apache Software Foundation. Hadoop. http://hadoop.apache.org/. Accessed Sep. 17th, 2012.Google Scholar
- R. Campbell, I. Gupta, M. Heath, S. Y. Ko, M. Kozuch, M. Kunze, T. Kwan, K. Lai, H. Y. Lee, M. Lyons, D. Milojicic, D. O'Hallaron, and Y. C. Soh. Open CirrusTMcloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research. In Proceedings of The 2009 Conference on Hot Topics in Cloud Computing, HotCloud'09, Berkeley, CA, USA, 2009. USENIX Association. Google ScholarDigital Library
- D. E. Comer, D. Gries, M. C. Mulder, A. Tucker, A. J. Turner, and P. R. Young. Computing as a Discipline. ACM Communications, 32(1):9--23, Jan. 1989. Google ScholarDigital Library
- D. Winer. XML-RPC Specification. http://www.xmlrpc.org/spec. Nov. 1999.Google Scholar
- P. J. Denning. ACM President's Letter: What is Experimental Computer Science? ACM Communications, 23(10):543--544, Oct. 1980. Google ScholarDigital Library
- P. J. Denning. ACM President's Letter: Performance Analysis: Experimental Computer Science as its Best. ACM Communications, 24(11):725--727, Nov. 1981. Google ScholarDigital Library
- P. J. Denning. Is Computer Science Science? ACM Communications, 48(4):27--31, Apr. 2005. Google ScholarDigital Library
- F. Desprez, G. Fox, E. Jeannot, K. Keahey, M. Kozuch, D. Margery, P. Neyron, L. Nussbaum, C. Perez, O. Richard, W. Smith, G. von Laszewski, and J. Voeckler. Supporting Experimental Computer Science. Technical report, Argonne National Laboratory Technical Memo, 2012.Google Scholar
- Free Software Foundation. GLPK (GNU Linear Programming Kit). http://www.gnu.org/software/glpk/. Accessed Sep. 17th, 2012.Google Scholar
- A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- A. S. Harji, P. A. Buhr, and T. Brecht. Our Troubles With Linux and Why You Should Care. In Proceedings of the Second Asia-Pacific Workshop on Systems, APSys '11, pages 2:1--2:5, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- M. Huang, A. Bavier, and L. Peterson. PlanetFlow: Maintaining Accountability for Network Services. SIGOPS Oper. Syst. Rev., 40(1):89--94, Jan. 2006. Google ScholarDigital Library
- E. Jaffe, D. Bickson, and S. Kirkpatrick. Everlab: A Production Platform for Research in Network Experimentation and Computation. In Proceedings of the 21th Large Installation System Administration Conference, pages 203--213, 2007. Google ScholarDigital Library
- R. Jain. The Art of Computer Systems Performance Analysis. Wiley Professional Computing. Wiley, 1991.Google Scholar
- Julian Seward. bzip2 and libbzip2. http://www.bzip.org/. Accessed Sep. 17th, 2012.Google Scholar
- T. Kalibera and R. Jones. Handles Revisited: Optimising Performance and Memory Costs in a Real-Time Collector. In Proceedings of The International Symposium on Memory Management, ISMM '11, pages 89--98, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- T. Kalibera and P. Tuma. Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results. In Proceedings of the Third European Conference on Formal Methods and Stochastic Models for Performance Evaluation, EPEW'06, pages 63--77, Berlin, Heidelberg, 2006. Springer-Verlag. Google ScholarDigital Library
- D. Montgomery. Design and Analysis of Experiments. John Wiley & Sons, 2008.Google Scholar
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! SIGPLAN Notes, 44(3):265--276, Mar. 2009. Google ScholarDigital Library
- NLANR/DAST. Iperf. http://iperf.sourceforge.net/. Accessed Sep. 17th, 2012.Google Scholar
- L. Paterson and T. Roscoe. The Design Principles of PlanetLab. Operating Systems Review, 40(1):11--16, January 2006. Google ScholarDigital Library
- L. Peterson, A. Bavier, M. E. Fiuczynski, and S. Muir. Experiences Building PlanetLab. In Proceedings of The 7th Symposium on Operating Systems Design and Implementation, OSDI '06, pages 351--366, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
- L. Peterson and V. S. Pai. Experience-Driven Experimental Systems Research. ACM Communications, 50(11):38--44, 2007. Google ScholarDigital Library
- PlanetLab. PlanetLab Bibliography. http://www.planet-lab.org/biblio visited 2012-09--28.Google Scholar
- Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/. Accessed Sep. 17th, 2012.Google Scholar
- T. Tannenbaum, D. Wright, K. Miller, and M. Livny. Condor -- a distributed job scheduler. In T. Sterling, editor, Beowulf Cluster Computing with Linux. MIT Press, October 2001. Google ScholarDigital Library
- The Gentoo Foundation. Gentoo Linux. http://www.gentoo.org/. Accessed Oct. 5th, 2012.Google Scholar
- The Tukaani Project. XZ Utils. http://tukaani.org/xz/. Accessed Sep. 17th, 2012.Google Scholar
- W. F. Tichy. Should Computer Scientists Experiment More? IEEE Computer, 31(5):32--40, 1998. Google ScholarDigital Library
- W. F. Tichy, P. Lukowicz, L. Prechelt, and E. A. Heinz. Experimental Evaluation in Computer Science: A Quantitative Study. Systems Software, 28:9--18, 1995. Google ScholarDigital Library
- Vince Weaver. Perf Event Overhead Measurements. http://web.eecs.utk.edu/ vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/. Accessed Sep. 17th, 2012.Google Scholar
- J. Vitek and T. Kalibera. Repeatability, Reproducibility, and Rigor in Systems Research. In Proceedings of The Ninth ACM International Conference on Embedded Software, EMSOFT '11, pages 33--38, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- W. Bergmans. Maximum Compression. http://www.maximumcompression.com/data/files/index.html. Accessed Sep. 17th, 2012.Google Scholar
- R. P. Weicker. Dhrystone: A Synthetic Systems Programming Benchmark. ACM Communications, 27(10):1013--1030, Oct. 1984. Google ScholarDigital Library
Index Terms
- DataMill: rigorous performance evaluation made easy
Recommendations
DataMill: a distributed heterogeneous infrastructure forrobust experimentation
Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often ...
KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and ReplicabilityDistributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to ...
Interoperability between the X.509 and EDIFACT Public Key Infrastructures: The DEDICA Project
DEXA '98: Proceedings of the 9th International Workshop on Database and Expert Systems ApplicationsDuring these last years, a big amount of efforts have been devoted to specify and develop public key infrastructures (PKIs). Several initiatives around the world have given as a result the emergency of the one PKI based on X.509 certificates and other ...
Comments