ABSTRACT
Due to the amount of time required to design a new processor, one set of benchmark programs may be used during the design phase while another may be the standard when the design is finally delivered. Using one benchmark suite to design a processor while using a different, presumably more current, suite to evaluate its ultimate performance may lead to sub-optimal design decisions if there are large differences between the characteristics of the two suites and their respective compilers. We call this changes across time "drift". To evaluate the impact of using yesterday's benchmark and compiler technology to design tomorrow's processors, we compare common benchmarks from the SPEC 95 and SPEC 2000 benchmark suites. Our results yield three key conclusions. First, we show that the amount of drift, for common programs in successive SPEC benchmark suites, is significant. In SPEC 2000, the main memory access time is a far more significant performance bottleneck than in SPEC 95, while less significant SPEC 2000 performance bottlenecks include the L2 cache latency, the L1 I-cache size, and the number of reorder buffer entries. Second, using two different statistical techniques, we show that compiler drift is not as significant as benchmark drift. Third, we show that benchmark and compiler drift can have a significant impact on the final design decisions. Specifically, we use a one-parameter-at-a-time optimization algorithm to design two different year-2000 processors, one optimized for SPEC 95 and the other optimized for SPEC 2000, using the energy-delay product (EDP) as the optimization criterion. The results show that using SPEC 95 to design a year-2000 processor results in an 18.5% larger EDP and a 20.8% higher CPI than using the SPEC 2000 benchmarks to design the corresponding processor. Finally, we make a few recommendations to help computer architects minimize the effects of benchmark and compiler drift.
- Bell Jr., R. and John, L., "The Case for Automatic Synthesis of Miniature Benchmarks," In Proceedings of the Workshop on Modeling, Benchmarking, and Simulation (MoBS '05) (Madison, WI, USA, June 4--8, 2005), 88--97.Google Scholar
- Bell Jr., R. and John, L., "Improved Automatic Testcase Synthesis for Performance Model Validation," In Proceedings of the International Conference on Supercomputing (ICS '05) (Cambridge, MA, USA, June 20--22, 2005), 111--120. Google ScholarDigital Library
- Brooks, D., Tiwari, V., and Martonosi, M., "Wattch; A Framework for Architectural-Level Power Analysis and Optimizations," In Proceedings of the International Symposium on Computer Architecture (ISCA '00) (Vancouver, Canada, June 10--14, 2000), 83--94. Google ScholarDigital Library
- Brooks, D., Bose, P., Schuster, S., Jacobson, H., Kudva, P., Buyuktosunoglu, A., Wellman, J., Zyuban, V., Gupta, M., and Cook, P., "Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors," IEEE Micro, 20, 6, (Nov./Dec. 2000), 26--44. Google ScholarDigital Library
- Burger, D. and Austin, T. "The SimpleScalar Tool Set, Version 2.0," ACM Computer Architecture News, (June 1997), 13--25. Google ScholarDigital Library
- Calder, B., Grunwald, D., and Zorn, B., "Quantifying Behavioral Differences Between C and C++ Programs," Journal of Programming Languages, 2, 4, (1994), 313--351.Google Scholar
- Carlton, A., "Lessons Learned from 072.sc", SPEC Newsletter, (Mar. 1995).Google Scholar
- Choi, Y., Knies, A., Gerke, L., and Ngai, T., "The Impact of If-Conversion on Branch Prediction and Program Execution on the Intel Itanium Processor," In Proceedings of the International Symposium on Microarchitecture (Micro '01) (Austin, TX, December 2--5, 2001), 182--191. Google ScholarDigital Library
- Dujmovic, J. and Dujmovic, I., "Evolution and Evaluation of SPEC Benchmarks," ACM SIGMETRICS Performance Evaluation Review, 26, 3, (Dec. 1998), 2--9. Google ScholarDigital Library
- Eeckhout, L., Vandierendonck, H., and De Bosschere, K., "Workload Design: Selecting Representative Program-Input Pairs," In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT '02) (Charlottesville, VA, USA, September 22--25, 2002), 83--94. Google ScholarDigital Library
- Hamerly, G., Perelman, E., and Calder, B., "How to Use SimPoint to Pick Simulation Points," ACM SIGMETRICS Performance Evaluation Review, 31, 4, (Mar. 2004), 25--30. Google ScholarDigital Library
- Hennessy, J. and Patterson, D., "Computer architecture: A Quantiative Approach," Morgan-Kauffman, San Francisco, CA, 2003. Google ScholarDigital Library
- Henning, J., "SPEC CPU2000: Measuring CPU Performance in the New Millennium," IEEE Computer, 33, 7, (Jul. 2000), 28--35. Google ScholarDigital Library
- Henning, J., "SPEC CPU2000 Memory Footprint," http://www.spec.org/cpu2000/analysis/memoryGoogle Scholar
- Lilja, D., "Measuring Computer Performance," Cambridge University Press, New York, NY, 2000.Google Scholar
- Phansalkar, A., Joshi, A., Eeckhout, L., and John, L., "Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites," Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS '05) (Austin, TX, March 20--22, 2005), 10--20. Google ScholarDigital Library
- Plackett, R. and Burman, J. "The Design of Optimum Multifactorial Experiments," Biometrika, 33, 4, (June 1946), 305--325.Google ScholarCross Ref
- Sherwood, T., Perelman, E., Hamerly, G., and Calder, B., "Automatically Characterizing Large Scale Program Behavior," In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '02) (San Jose, CA, USA, October 5--9, 2002), 45--57. Google ScholarDigital Library
- http://www.cs.ucsd.edu/~calder/simpointGoogle Scholar
- Skadron, K., Martonosi, M., August, D., Hill, M., Lilja, D., and Pai, V., "Challenges in Computer Architecture Evaluation," IEEE Computer, 36, 8, (Aug. 2003), 30--36. Google ScholarDigital Library
- http://www.spec.orgGoogle Scholar
- Vandierendonck, H. and De Bosschere, K., "Eccentric and Fragile Benchmarks," In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS '04) (Austin, TX, March 10--12, 2004), 2--11. Google ScholarDigital Library
- http://www.eecs.umich.edu/~chriswea/benchmarks/SPEC 2000.htmlGoogle Scholar
- Weicker, R., "An Example of Benchmark Obsolescence: 023.eqntott," SPEC Newsletter, (Dec. 1995).Google Scholar
- Wulf, W. and McKee, S., "Hitting the Memory Wall: Implications of the Obvious," ACM Computer Architecture News, 23, 1, (Mar. 1995), 20--24. Google ScholarDigital Library
- Yi, J., Lilja, D., and Hawkins, D., "A Statistically Rigorous Approach for Improving Simulation Methodology," In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA '03) (Anacheim, CA, USA, February 8--12, 2003), 281--291. Google ScholarDigital Library
- Yi, J., Kodakara, S., Sendag, R., Lilja, D., and Hawkins, D., "Characterizing and Comparing Prevailing Simulation Techniques," In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA '05) (San Francisco, CA, USA, February 12--16, 2005), 266--277. Google ScholarDigital Library
Index Terms
- The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools
Recommendations
MediaBreeze: a decoupled architecture for accelerating multimedia applications
Special Issue: PACT 2001 workshopsDecoupled architectures are fine-grain processors that partition the memory access and execute functions in a computer program and exploit the parallelism between the two functions. Although some concepts from the traditional decoupled access execute ...
Subsetting the SPEC CPU2006 benchmark suite
On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006 -- the next generation of industry-standardized CPU-intensive benchmark suite. The SPEC CPU benchmark suite has become the most frequently used suite for ...
A Benchmark Characterization of the EEMBC Benchmark Suite
Benchmark consumers expect benchmark suites to be complete, accurate, and consistent, and benchmark scores serve as relative measures of performance. However, it is important to understand how benchmarks stress the processors that they aim to test. This ...
Comments