Abstract
In this paper we present a profiling methodology and toolkit for helping developers discover hidden asymptotic inefficiencies in the code. From one or more runs of a program, our profiler automatically measures how the performance of individual routines scales as a function of the input size, yielding clues to their growth rate. The output of the profiler is, for each executed routine of the program, a set of tuples that aggregate performance costs by input size. The collected profiles can be used to produce performance plots and derive trend functions by statistical curve fitting or bounding techniques. A key feature of our method is the ability to automatically measure the size of the input given to a generic code fragment: to this aim, we propose an effective metric for estimating the input size of a routine and show how to compute it efficiently. We discuss several case studies, showing that our approach can reveal asymptotic bottlenecks that other profilers may fail to detect and characterize the workload and behavior of individual routines in the context of real applications. To prove the feasibility of our techniques, we implemented a Valgrind tool called aprof and performed an extensive experimental evaluation on the SPEC CPU2006 benchmarks. Our experiments show that aprof delivers comparable performance to other prominent Valgrind tools, and can generate informative plots even from single runs on typical workloads for most algorithmically-critical routines.
- An efficient variation of bubble sort. Information Processing Letters, 11 (1): 5--6, 1980.Google ScholarCross Ref
- G. Ammons, T. Ball, and J. R. Larus. Exploiting hardware performance counters with flow and context sensitive profiling. SIGPLAN Not., 32 (5): 85--96, 1997. Google ScholarDigital Library
- M. Arnold and B. Ryder. A framework for reducing the cost of instrumented code. In PLDI, pages 168--179. ACM, 2001. Google ScholarDigital Library
- T. Ball and J. R. Larus. Efficient path profiling. In MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, pages 46--57, 1996. Google ScholarDigital Library
- T. Ball, P. Mataga, and M. Sagiv. Edge profiling versus path profiling: the showdown. In POPL, pages 134--148. ACM, 1998. Google ScholarDigital Library
- M. D. Bond and K. S. McKinley. Practical path profiling for dynamic optimizers. In CGO, pages 205--216. IEEE Computer Society, 2005. Google ScholarDigital Library
- M. D. Bond and K. S. McKinley. Probabilistic calling context. SIGPLAN Not. (Proc. OOPSLA 2007), 42 (10): 97--112, 2007. Google ScholarDigital Library
- R. E. Bryant. Personal communication, September 2011.Google Scholar
- R. E. Bryant and D. R. O'Hallaron. Computer Systems: A Programmer's Perspective. Pearson Education, 2010. Google ScholarDigital Library
- R. E. Bryant, D. L. Beatty, K. S. Brace, K. Cho, and T. J. Sheffler. COSMOS: A Compiled Simulator for MOS Circuits. In DAC, pages 9--16, 1987. Google ScholarDigital Library
- J. M. Chambers, W. S. Cleveland, B. Kleiner, and P. A. Tukey. Graphical Methods for Data Analysis. Chapman and Hall, New York, 1983.Google Scholar
- J. Chow, T. Garfinkel, and P. M. Chen. Decoupling dynamic program analysis from execution in virtual environments. In USENIX 2008 Annual Technical Conference, pages 1--14, 2008. Google ScholarDigital Library
- D. C. D'Elia, C. Demetrescu, and I. Finocchi. Mining hot calling contexts in small space. In M. W. Hall and D. A. Padua, editors, PLDI, pages 516--527. ACM, 2011. Google ScholarDigital Library
- C. Demetrescu, I. Finocchi, and G. F. Italiano. Algorithm engineering. Bulletin of the EATCS (algorithmics column), 79: 48--63, 2003.Google Scholar
- Fedora Project. wf: simple word frequency counter. http:// rpmfind.net//linux/RPM/fedora/devel/rawhide/src/w/wf-0.41-6.fc17.src.html.Google Scholar
- N. Froyd, J. Mellor-Crummey, and R. Fowler. Low-overhead call path profiling of unmodified, optimized code. In Proc. 19th Annual International Conf. on Supercomputing, pages 81--90. ACM, 2005. Google ScholarDigital Library
- Genome bioinformatics research laboratory. Resources and datasets. http://genome.crg.es/main/databases.html.Google Scholar
- S. Goldsmith, A. Aiken, and D. S. Wilkerson. Measuring empirical computational complexity. In ESEC/SIGSOFT FSE, pages 395--404, 2007. Google ScholarDigital Library
- S. L. Graham, P. B. Kessler, and M. K. McKusick. gprof: a call graph execution profiler (with retrospective). In K. S. McKinley, editor, Best of PLDI, pages 49--57. ACM, 1982.Google Scholar
- R. J. Hall. Call path refinement profiles. IEEE Trans. Softw. Eng., 21 (6): 481--496, 1995. Google ScholarDigital Library
- R. J. Hall and A. J. Goldberg. Call path profiling of monotonic program resources in UNIX. In Proc. Summer 1993 USENIX Technical Conference, pages 1--19. USENIX Association, 1993. Google ScholarDigital Library
- M. Hart. Gutenberg Project. http://www.gutenberg.org/.Google Scholar
- J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, 2006. Google ScholarDigital Library
- M. Hirzel and T. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In Proc. 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2001.Google Scholar
- D. Johnson. A theoretician's guide to the experimental analysis of algorithms. In Data Structures, Near Neighbor Searches, and Methodology, pages 215--250. American Mathematical Society, 2002.Google ScholarCross Ref
- R. Joshi, M. D. Bond, and C. Zilles. Targeted path profiling: Lower overhead path profiling for staged dynamic optimization systems. In CGO, pages 239--250. IEEE Computer Society, 2004. Google ScholarDigital Library
- D. E. Knuth and F. R. Stevenson. Optimal measurement points for program frequency counts. BIT, 13: 313--322, 1973.Google ScholarCross Ref
- T. Küstner, J. Weidendorfer, and T. Weinzierl. Argument controlled profiling. In Euro-Par'09, pages 177--184, 2010. Google ScholarDigital Library
- G. Marin and J. M. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In Proc. SIGMETRICS 2004, pages 2--13, 2004. Google ScholarDigital Library
- C. C. McGeoch. Experimental algorithmics. Communications of the ACM, 50 (11): 27--31, 2007. Google ScholarDigital Library
- C. C. McGeoch, P. Sanders, R. Fleischer, P. R. Cohen, and D. Precup. Using finite experiments to study asymptotic performance. In Experimental Algorithmics, LNCS 2547, pages 93--126, 2002. Google ScholarDigital Library
- D. Melski and T. W. Reps. Interprocedural path profiling. In Proc. 8th Int. Conf. on Compiler Construction, LNCS 1575, pages 47--62, 1999. Google ScholarDigital Library
- B. M. E. Moret. Towards a discipline of experimental algorithmics. In Data Structures, Near Neighbor Searches, and Methodology, pages 197--250. American Mathematical Society, 2002.Google ScholarCross Ref
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI, pages 89--100, 2007. Google ScholarDigital Library
- D. Ofelt and J. L. Hennessy. Efficient performance prediction for modern microprocessors. In SIGMETRICS, pages 229--239, 2000. Google ScholarDigital Library
- C. Ponder and R. J. Fateman. Inaccuracies in program profilers. Softw., Pract. Exper., 18 (5): 459--467, 1988. Google ScholarDigital Library
- S. Roy and Y. N. Srikant. Profiling k-iteration paths: A generalization of the ball-larus profiling algorithm. In CGO, pages 70--80, 2009. Google ScholarDigital Library
- J. Seward and N. Nethercote. Using Valgrind to detect undefined value errors with bit-precision. In USENIX Annual Technical Conference, pages 17--30. USENIX, 2005. Google ScholarDigital Library
- J. M. Spivey. Fast, accurate call graph profiling. Softw., Pract. Exper., 34 (3): 249--264, 2004. Google ScholarDigital Library
- R. E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22 (2): 215--225, 1975. Google ScholarDigital Library
- K. Vaswani, A. V. Nori, and T. M. Chilimbi. Preferential path profiling: compactly numbering interesting paths. In POPL, pages 351--362. ACM, 2007. Google ScholarDigital Library
- J. Weidendorfer, M. Kowarschik, and C. Trinitis. A tool suite for simulation based analysis of memory access behavior. In Int. Conf. on Computational Science, LNCS 3038, pages 440--447, 2004.Google ScholarCross Ref
- J. Whaley. A portable sampling-based profiler for Java virtual machines. In Proc. ACM 2000 Conf. on Java Grande, pages 78--87, 2000. Google ScholarDigital Library
- R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. B. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. P. Puschner, J. Staschulat, and P. Stenström. The worst-case execution-time problem - overview of methods and survey of tools. ACM Trans. Embedded Comput. Syst., 7 (3), 2008. Google ScholarDigital Library
- T. Williams and C. Kelley. Gnuplot: command-driven interactive function plotting program. http://www.gnuplot.info/.Google Scholar
- X. Zhuang, M. J. Serrano, H. W. Cain, and J.-D. Choi. Accurate, efficient, and adaptive calling context profiling. In PLDI, pages 263--271, 2006. Google ScholarDigital Library
Index Terms
- Input-sensitive profiling
Recommendations
Input-sensitive profiling
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationIn this paper we present a profiling methodology and toolkit for helping developers discover hidden asymptotic inefficiencies in the code. From one or more runs of a program, our profiler automatically measures how the performance of individual routines ...
Estimating the Empirical Cost Function of Routines with Dynamic Workloads
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationA crucial aspect in software development is understanding how an application's performance scales as a function of its input data. Estimating the empirical cost function of individual routines of a program can help developers predict the runtime on ...
Estimating the Empirical Cost Function of Routines with Dynamic Workloads
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationA crucial aspect in software development is understanding how an application's performance scales as a function of its input data. Estimating the empirical cost function of individual routines of a program can help developers predict the runtime on ...
Comments