Abstract
Cycle-accurate simulation is the dominant methodology for processor design space analysis and performance prediction. However, with the prevalence of multi-core, multi-threaded architectures, this method has become highly impractical as the sole means for design due to its extreme slowdowns. We have developed a statistical technique for modeling multicore processors that is based on Monte Carlo methods. Using this method, processor models of contemporary architectures can be developed and applied to performance prediction, bottleneck detection, and limited design space analysis. To date, we have accurately modeled the IBM Cell, the Intel Itanium, and the Sun Niagara 1 and Niagara 2 processors [23, 22, 8]. In this paper, we present a work in progress which is applying this methodology to an out-of-order execution processor. We present the initial single-core model and results for the AMD Barcelona (Opteron) processor.
- AMD64 Architecture Programmer's Manual. http://developer.amd.com/documentation/guides.Google Scholar
- CPU2006 benchmark suite. http://www.spec.org/cpu2006/.Google Scholar
- MARSSx86 - micro-architectural and system simulator for x86-based systems. http://www.marss86.org.Google Scholar
- Papi performance monitoring tool. http://icl.cs.utk.edu/papi/.Google Scholar
- The PIN tool. http://rogue.colorado.edu/Pin/index.html.Google Scholar
- Software Optimization Guide for AMD Family 10h Processors. http://developer.amd.com/documentation/guides.Google Scholar
- The SimpleScalar tool set. http://www.simplescalar.com/.Google Scholar
- W. Alkohlani, J. Cook, and R. Srinivasan. Extending the Monte Carlo Processor Modeling Technique: Statistical Performance Models of the Niagara 2 Processor. Proc of the 39th International Conference on Parallel Processing (ICPP), September 2010. Google ScholarDigital Library
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26:52--60, 2006. Google ScholarDigital Library
- S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith. A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst., 27(2):1--37, 2009. Google ScholarDigital Library
- D. Genbrugge and L. Eeckhout. Chip multiprocessor design space exploration through statistical simulation. IEEE Transactions on Computers, 58:1668--1681, 2009. Google ScholarDigital Library
- G. Hamerly, E. Perelman, and B. Calder. How to use SimPoint to pick simulation points. SIGMETRICS Perform. Eval. Rev., 31(4):25--30, 2004. Google ScholarDigital Library
- J. C. Hoe, D. Burger, J. Emer, D. Chiou, R. Sendag, and J. Yi. The future of architectural simulation. Micro, IEEE, 30(3):8--18, 2010. Google ScholarDigital Library
- P. J. Joseph, K. Vaswani, and M. J. Thazhuthaveetil. A predictive performance model for superscalar processors. In MICRO 39: Proc of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 161--170, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- A. J. KleinOsowski and D. J. Lilja. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research. IEEE Comput. Archit. Lett., 1(1):7, 2002. Google ScholarDigital Library
- B. C. Lee, J. D. Collins, H. W. 0003, and D. Brooks. CPR: Composable performance regression for scalable multiprocessor models. In MICRO, pages 270--281, 2008. Google ScholarDigital Library
- M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33:92--99, November 2005. Google ScholarDigital Library
- S. Nussbaum and J. E. Smith. Modeling superscalar processors via statistical simulation. In PACT '01: Proc of the 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 15--24, Washington, DC, USA, 2001. IEEE Computer Society. Google ScholarDigital Library
- A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the spec cpu2006 benchmark suite. In ISCA 07: Proc. 34th Annual Int'l Symposium on Computer Architecture, pages 412--423, New York, NY, USA, ACM, 2007. Google ScholarDigital Library
- A. Phansalkar, A. Joshi, and L. K. John. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite. In ISCA '07: Proc of the 34th annual international symposium on Computer architecture, pages 412--423, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.Google Scholar
- R. Srinivasan, J. Cook, and O. Lubeck. Performance Modeling Using Monte Carlo Simulation. IEEE Computer Architecture Letters, 5(1):38--41, June 2006. Google ScholarDigital Library
- R. Srinivasan, J. Cook, and O. Lubeck. Ultra-Fast CPU Performance Prediction: Extending the Monte Carlo Approach. Proc of the IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pages 107--116, October 2006. Google ScholarDigital Library
- J. J. Yi, S. V. Kodakara, R. Sendag, D. J. Lilja, and D. M. Hawkins. Characterizing and comparing prevailing simulation techniques. In HPCA '05: Proc of the 11th International Symposium on High-Performance Computer Architecture, pages 266--277, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- H. Zeng, M. Yourst, K. Ghose, and D. Ponomarev. MPTLsim: a simulator for x86 multicore processors. In Proc of the 46th Annual Design Automation Conference, DAC '09, pages 226--231, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Index Terms
- A statistical performance model of the opteron processor
Recommendations
Older Opteron Outperforms the Newer Xeon: A Memory Intensive Application Study of Server Based Microprocessors
ICSENG '11: Proceedings of the 2011 21st International Conference on Systems EngineeringIn this paper we describe the performance evaluation and comparison of a older "dual processor dual core AMD Opteron" server processor and a newer "single processor quad core Intel Xeon" server processor, on their performance in executing memory ...
Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor
ICPADS '12: Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed SystemsEvery mainstream processor vendor provides an optimized BLAS implementation for its CPU, as BLAS is a fundamental math library in scientific computing. The Loongson 3A CPU is a general-purpose 64-bit MIPS64 quad-core processor, developed by the ...
Comments