research-article

DataMill: rigorous performance evaluation made easy

Authors:
Augusto Born de Oliveira

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Jean-Christophe Petkovich

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Thomas Reidemeister

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Sebastian Fischmeister

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance EngineeringApril 2013Pages 137–148https://doi.org/10.1145/2479871.2479892

Published:21 April 2013Publication History

ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

Pages 137–148

ABSTRACT

Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with the given time pressure of producing results. So how can one trust any reported empirical analysis of a new idea or concept in computer science?

This paper introduces DataMill, a community-based easy-to-use services-oriented open benchmarking infrastructure for performance evaluation. DataMill facilitates producing robust, reliable, and reproducible results. The infrastructure incorporates the latest results on hidden factors and automates the variation of these factors. Multiple research groups already participate in DataMill.

DataMill is also of interest for research on performance evaluation. The infrastructure supports quantifying the effect of hidden factors, disseminating the research results beyond mere reporting. It provides a platform for investigating interactions and composition of hidden factors.

Supplemental Material

Available for Download

zip

icpe15.zip (54.5 MB)

This data archive is an accompaniment to the ICPE'13 paper titled: DataMill: Rigorous Performance Evaluation Made Easy by Augusto Born de Oliveira, Jean-Christophe Petkovich, Thomas Reidemeister, Sebastian Fischmeister. Any questions about the contents of this file may be directed to [email protected].

References

Amazon Web Services LLC. Amazon Web Services. http://aws.amazon.com/. Accessed Sep. 17th, 2012.Google Scholar
J. Antony. Design of Experiments for Engineers and Scientists. Butterworth-Heinemann, 2003.Google Scholar
Apache Software Foundation. Hadoop. http://hadoop.apache.org/. Accessed Sep. 17th, 2012.Google Scholar
R. Campbell, I. Gupta, M. Heath, S. Y. Ko, M. Kozuch, M. Kunze, T. Kwan, K. Lai, H. Y. Lee, M. Lyons, D. Milojicic, D. O'Hallaron, and Y. C. Soh. Open CirrusTMcloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research. In Proceedings of The 2009 Conference on Hot Topics in Cloud Computing, HotCloud'09, Berkeley, CA, USA, 2009. USENIX Association. Google ScholarDigital Library
D. E. Comer, D. Gries, M. C. Mulder, A. Tucker, A. J. Turner, and P. R. Young. Computing as a Discipline. ACM Communications, 32(1):9--23, Jan. 1989. Google ScholarDigital Library
D. Winer. XML-RPC Specification. http://www.xmlrpc.org/spec. Nov. 1999.Google Scholar
P. J. Denning. ACM President's Letter: What is Experimental Computer Science? ACM Communications, 23(10):543--544, Oct. 1980. Google ScholarDigital Library
P. J. Denning. ACM President's Letter: Performance Analysis: Experimental Computer Science as its Best. ACM Communications, 24(11):725--727, Nov. 1981. Google ScholarDigital Library
P. J. Denning. Is Computer Science Science? ACM Communications, 48(4):27--31, Apr. 2005. Google ScholarDigital Library
F. Desprez, G. Fox, E. Jeannot, K. Keahey, M. Kozuch, D. Margery, P. Neyron, L. Nussbaum, C. Perez, O. Richard, W. Smith, G. von Laszewski, and J. Voeckler. Supporting Experimental Computer Science. Technical report, Argonne National Laboratory Technical Memo, 2012.Google Scholar
Free Software Foundation. GLPK (GNU Linear Programming Kit). http://www.gnu.org/software/glpk/. Accessed Sep. 17th, 2012.Google Scholar
A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
A. S. Harji, P. A. Buhr, and T. Brecht. Our Troubles With Linux and Why You Should Care. In Proceedings of the Second Asia-Pacific Workshop on Systems, APSys '11, pages 2:1--2:5, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
M. Huang, A. Bavier, and L. Peterson. PlanetFlow: Maintaining Accountability for Network Services. SIGOPS Oper. Syst. Rev., 40(1):89--94, Jan. 2006. Google ScholarDigital Library
E. Jaffe, D. Bickson, and S. Kirkpatrick. Everlab: A Production Platform for Research in Network Experimentation and Computation. In Proceedings of the 21th Large Installation System Administration Conference, pages 203--213, 2007. Google ScholarDigital Library
R. Jain. The Art of Computer Systems Performance Analysis. Wiley Professional Computing. Wiley, 1991.Google Scholar
Julian Seward. bzip2 and libbzip2. http://www.bzip.org/. Accessed Sep. 17th, 2012.Google Scholar
T. Kalibera and R. Jones. Handles Revisited: Optimising Performance and Memory Costs in a Real-Time Collector. In Proceedings of The International Symposium on Memory Management, ISMM '11, pages 89--98, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
T. Kalibera and P. Tuma. Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results. In Proceedings of the Third European Conference on Formal Methods and Stochastic Models for Performance Evaluation, EPEW'06, pages 63--77, Berlin, Heidelberg, 2006. Springer-Verlag. Google ScholarDigital Library
D. Montgomery. Design and Analysis of Experiments. John Wiley & Sons, 2008.Google Scholar
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! SIGPLAN Notes, 44(3):265--276, Mar. 2009. Google ScholarDigital Library
NLANR/DAST. Iperf. http://iperf.sourceforge.net/. Accessed Sep. 17th, 2012.Google Scholar
L. Paterson and T. Roscoe. The Design Principles of PlanetLab. Operating Systems Review, 40(1):11--16, January 2006. Google ScholarDigital Library
L. Peterson, A. Bavier, M. E. Fiuczynski, and S. Muir. Experiences Building PlanetLab. In Proceedings of The 7th Symposium on Operating Systems Design and Implementation, OSDI '06, pages 351--366, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
L. Peterson and V. S. Pai. Experience-Driven Experimental Systems Research. ACM Communications, 50(11):38--44, 2007. Google ScholarDigital Library
PlanetLab. PlanetLab Bibliography. http://www.planet-lab.org/biblio visited 2012-09--28.Google Scholar
Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/. Accessed Sep. 17th, 2012.Google Scholar
T. Tannenbaum, D. Wright, K. Miller, and M. Livny. Condor -- a distributed job scheduler. In T. Sterling, editor, Beowulf Cluster Computing with Linux. MIT Press, October 2001. Google ScholarDigital Library
The Gentoo Foundation. Gentoo Linux. http://www.gentoo.org/. Accessed Oct. 5th, 2012.Google Scholar
The Tukaani Project. XZ Utils. http://tukaani.org/xz/. Accessed Sep. 17th, 2012.Google Scholar
W. F. Tichy. Should Computer Scientists Experiment More? IEEE Computer, 31(5):32--40, 1998. Google ScholarDigital Library
W. F. Tichy, P. Lukowicz, L. Prechelt, and E. A. Heinz. Experimental Evaluation in Computer Science: A Quantitative Study. Systems Software, 28:9--18, 1995. Google ScholarDigital Library
Vince Weaver. Perf Event Overhead Measurements. http://web.eecs.utk.edu/ vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/. Accessed Sep. 17th, 2012.Google Scholar
J. Vitek and T. Kalibera. Repeatability, Reproducibility, and Rigor in Systems Research. In Proceedings of The Ninth ACM International Conference on Embedded Software, EMSOFT '11, pages 33--38, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
W. Bergmans. Maximum Compression. http://www.maximumcompression.com/data/files/index.html. Accessed Sep. 17th, 2012.Google Scholar
R. P. Weicker. Dhrystone: A Synthetic Systems Programming Benchmark. ACM Communications, 27(10):1013--1030, Oct. 1984. Google ScholarDigital Library

Index Terms

DataMill: rigorous performance evaluation made easy

Recommendations

DataMill: a distributed heterogeneous infrastructure forrobust experimentation

Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often ...
Read More
KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and Replicability

Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to ...
Read More
Interoperability between the X.509 and EDIFACT Public Key Infrastructures: The DEDICA Project
DEXA '98: Proceedings of the 9th International Workshop on Database and Expert Systems Applications

During these last years, a big amount of efforts have been devoted to specify and develop public key infrastructures (PKIs). Several initiatives around the world have given as a result the emergency of the one PKI based on X.509 certificates and other ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
April 2013
446 pages
ISBN:9781450316361
DOI:10.1145/2479871
Editor:
Seetharami Seelam
IBM T.J. Watson Research Center, USA
,
General Chairs:
Petr Tůma
Charles University, Czech Republic
,
Giuliano Casale
Imperial College London, UK
,
Program Chairs:
Tony Field
Imperial College London, UK
,
José Nelson Amaral
University of Alberta, Canada
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
datamill
experimentation
infrastructure
performance
repeatability
reproducibility
robustness
Qualifiers
- research-article
Conference

Acceptance Rates
ICPE '13 Paper Acceptance Rate28of64submissions,44%Overall Acceptance Rate252of851submissions,30%
More
Upcoming Conference
ICPE '24

Sponsor:

sigsoft online

sigsoft online

15th ACM/SPEC International Conference on Performance Engineering

May 7 - 11, 2024

London , United Kingdom
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 191
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DataMill: rigorous performance evaluation made easy

ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

DataMill: a distributed heterogeneous infrastructure forrobust experimentation

KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

Interoperability between the X.509 and EDIFACT Public Key Infrastructures: The DEDICA Project