skip to main content
10.1145/2479871.2479892acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article

DataMill: rigorous performance evaluation made easy

Published:21 April 2013Publication History

ABSTRACT

Empirical systems research is facing a dilemma. Minor aspects of an experimental setup can have a significant impact on its associated performance measurements and potentially invalidate conclusions drawn from them. Examples of such influences, often called hidden factors, include binary link order, process environment size, compiler generated randomized symbol names, or group scheduler assignments. The growth in complexity and size of modern systems will further aggravate this dilemma, especially with the given time pressure of producing results. So how can one trust any reported empirical analysis of a new idea or concept in computer science?

This paper introduces DataMill, a community-based easy-to-use services-oriented open benchmarking infrastructure for performance evaluation. DataMill facilitates producing robust, reliable, and reproducible results. The infrastructure incorporates the latest results on hidden factors and automates the variation of these factors. Multiple research groups already participate in DataMill.

DataMill is also of interest for research on performance evaluation. The infrastructure supports quantifying the effect of hidden factors, disseminating the research results beyond mere reporting. It provides a platform for investigating interactions and composition of hidden factors.

Skip Supplemental Material Section

Supplemental Material

References

  1. Amazon Web Services LLC. Amazon Web Services. http://aws.amazon.com/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  2. J. Antony. Design of Experiments for Engineers and Scientists. Butterworth-Heinemann, 2003.Google ScholarGoogle Scholar
  3. Apache Software Foundation. Hadoop. http://hadoop.apache.org/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  4. R. Campbell, I. Gupta, M. Heath, S. Y. Ko, M. Kozuch, M. Kunze, T. Kwan, K. Lai, H. Y. Lee, M. Lyons, D. Milojicic, D. O'Hallaron, and Y. C. Soh. Open CirrusTMcloud Computing Testbed: Federated Data Centers for Open Source Systems and Services Research. In Proceedings of The 2009 Conference on Hot Topics in Cloud Computing, HotCloud'09, Berkeley, CA, USA, 2009. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. E. Comer, D. Gries, M. C. Mulder, A. Tucker, A. J. Turner, and P. R. Young. Computing as a Discipline. ACM Communications, 32(1):9--23, Jan. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Winer. XML-RPC Specification. http://www.xmlrpc.org/spec. Nov. 1999.Google ScholarGoogle Scholar
  7. P. J. Denning. ACM President's Letter: What is Experimental Computer Science? ACM Communications, 23(10):543--544, Oct. 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. J. Denning. ACM President's Letter: Performance Analysis: Experimental Computer Science as its Best. ACM Communications, 24(11):725--727, Nov. 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. J. Denning. Is Computer Science Science? ACM Communications, 48(4):27--31, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Desprez, G. Fox, E. Jeannot, K. Keahey, M. Kozuch, D. Margery, P. Neyron, L. Nussbaum, C. Perez, O. Richard, W. Smith, G. von Laszewski, and J. Voeckler. Supporting Experimental Computer Science. Technical report, Argonne National Laboratory Technical Memo, 2012.Google ScholarGoogle Scholar
  11. Free Software Foundation. GLPK (GNU Linear Programming Kit). http://www.gnu.org/software/glpk/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  12. A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. S. Harji, P. A. Buhr, and T. Brecht. Our Troubles With Linux and Why You Should Care. In Proceedings of the Second Asia-Pacific Workshop on Systems, APSys '11, pages 2:1--2:5, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Huang, A. Bavier, and L. Peterson. PlanetFlow: Maintaining Accountability for Network Services. SIGOPS Oper. Syst. Rev., 40(1):89--94, Jan. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Jaffe, D. Bickson, and S. Kirkpatrick. Everlab: A Production Platform for Research in Network Experimentation and Computation. In Proceedings of the 21th Large Installation System Administration Conference, pages 203--213, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Jain. The Art of Computer Systems Performance Analysis. Wiley Professional Computing. Wiley, 1991.Google ScholarGoogle Scholar
  17. Julian Seward. bzip2 and libbzip2. http://www.bzip.org/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  18. T. Kalibera and R. Jones. Handles Revisited: Optimising Performance and Memory Costs in a Real-Time Collector. In Proceedings of The International Symposium on Memory Management, ISMM '11, pages 89--98, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Kalibera and P. Tuma. Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results. In Proceedings of the Third European Conference on Formal Methods and Stochastic Models for Performance Evaluation, EPEW'06, pages 63--77, Berlin, Heidelberg, 2006. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Montgomery. Design and Analysis of Experiments. John Wiley & Sons, 2008.Google ScholarGoogle Scholar
  21. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing Wrong Data Without Doing Anything Obviously Wrong! SIGPLAN Notes, 44(3):265--276, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. NLANR/DAST. Iperf. http://iperf.sourceforge.net/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  23. L. Paterson and T. Roscoe. The Design Principles of PlanetLab. Operating Systems Review, 40(1):11--16, January 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Peterson, A. Bavier, M. E. Fiuczynski, and S. Muir. Experiences Building PlanetLab. In Proceedings of The 7th Symposium on Operating Systems Design and Implementation, OSDI '06, pages 351--366, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Peterson and V. S. Pai. Experience-Driven Experimental Systems Research. ACM Communications, 50(11):38--44, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. PlanetLab. PlanetLab Bibliography. http://www.planet-lab.org/biblio visited 2012-09--28.Google ScholarGoogle Scholar
  27. Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  28. T. Tannenbaum, D. Wright, K. Miller, and M. Livny. Condor -- a distributed job scheduler. In T. Sterling, editor, Beowulf Cluster Computing with Linux. MIT Press, October 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. The Gentoo Foundation. Gentoo Linux. http://www.gentoo.org/. Accessed Oct. 5th, 2012.Google ScholarGoogle Scholar
  30. The Tukaani Project. XZ Utils. http://tukaani.org/xz/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  31. W. F. Tichy. Should Computer Scientists Experiment More? IEEE Computer, 31(5):32--40, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. F. Tichy, P. Lukowicz, L. Prechelt, and E. A. Heinz. Experimental Evaluation in Computer Science: A Quantitative Study. Systems Software, 28:9--18, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vince Weaver. Perf Event Overhead Measurements. http://web.eecs.utk.edu/ vweaver1/projects/perf-events/benchmarks/rdtsc_overhead/. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  34. J. Vitek and T. Kalibera. Repeatability, Reproducibility, and Rigor in Systems Research. In Proceedings of The Ninth ACM International Conference on Embedded Software, EMSOFT '11, pages 33--38, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Bergmans. Maximum Compression. http://www.maximumcompression.com/data/files/index.html. Accessed Sep. 17th, 2012.Google ScholarGoogle Scholar
  36. R. P. Weicker. Dhrystone: A Synthetic Systems Programming Benchmark. ACM Communications, 27(10):1013--1030, Oct. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DataMill: rigorous performance evaluation made easy

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICPE '13: Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
          April 2013
          446 pages
          ISBN:9781450316361
          DOI:10.1145/2479871

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 April 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICPE '13 Paper Acceptance Rate28of64submissions,44%Overall Acceptance Rate252of851submissions,30%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader