|
ABSTRACT
Mambo [4] is IBM's full-system simulator which models PowerPC systems, and provides a complete set of simulation tools to help IBM and its partners in pre-hardware development and performance evaluation for future systems. Currently Mambo simulates target systems on a single host thread. When the number of cores increases in a target system, Mambo's simulation performance for each core goes down. As the so-called "multi-core era" approaches, both target and host systems will have more and more cores. It is very important for Mambo to efficiently simulate a multi-core target system on a multi-core host system. Parallelization is a natural method to speed up Mambo under this situation. Parallel Mambo (P-Mambo) is a multi-threaded implementation of Mambo. Mambo's simulation engine is implemented as a user-level thread-scheduler. We propose a multi-scheduler method to adapt Mambo's simulation engine to multi-threaded execution. Based on this method a core-based module partition is proposed to achieve both high inter-scheduler parallelism and low inter-scheduler dependency. Protection of shared resources is crucial to both correctness and performance of P-Mambo. Since there are two tiers of threads in P-Mambo, protecting shared resources by only OS-level locks possibly introduces deadlocks due to user-level context switch. We propose a new lock mechanism to handle this problem. Since Mambo is an on-going project with many modules currently under development, co-existence with new modules is also important to P-Mambo. We propose a global-lock-based method to guarantee compatibility of P-Mambo with future Mambo modules. We have implemented the first version of P-Mambo in functional modes. The performance of P-Mambo has been evaluated on the OpenMP implementation of NAS Parallel Benchmark (NPB) 3.2 [12]. Preliminary experimental results show that P-Mambo achieves an average speedup of 3.4 on a 4-core host machine.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News (CAN), September 2005.
|
 |
4
|
Patrick Bohrer , James Peterson , Mootaz Elnozahy , Ram Rajamony , Ahmed Gheith , Ron Rockhold , Charles Lefurgy , Hazim Shafi , Tarun Nakra , Rick Simpson , Evan Speight , Kartik Sudeep , Eric Van Hensbergen , Lixin Zhang, Mambo: a full system simulator for the PowerPC architecture, ACM SIGMETRICS Performance Evaluation Review, v.31 n.4, p.8-12, March 2004
[doi> 10.1145/1054907.1054910]
|
| |
5
|
D. Burger, T. M. Austin, and S. Bennett. Evaluating Future Microprocessors: The SimpleScalar Tool Set. Technical Report CS-TR-1996-1308, 1996.
|
| |
6
|
L. Ceze, K. Strauss, G. Almasi, P. J. Bohrer, J. R. Brunheroto, C. Cascaval, J. G. Castanos, D. Lieber, X. Martorell, J. E. Moreira, A. Sanomiya, and E. Schenfeld. Full Circle: Simulating Linux Clusters on Linux Clusters. In Proceedings of the Fourth LCI International Conference on Linux Clusters: The HPC Revolution 2003, June 2003.
|
| |
7
|
Derek Chiou , Dam Sunwoo , Joonsoo Kim , Nikhil A. Patil , William Reinhart , Darrel Eric Johnson , Jebediah Keefe , Hari Angepat, FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators, Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, p.249-261, December 01-05, 2007
[doi> 10.1109/MICRO.2007.16]
|
| |
8
|
Cathy May , Ed Silha , Rick Simpson , Hank Warren , CORPORATE International Business Machines, Inc., The PowerPC architecture: a specification for a new family of RISC processors, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1994
|
 |
9
|
|
| |
10
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916
]
|
| |
11
|
Njuguna Njoroge , Jared Casper , Sewook Wee , Yuriy Teslyar , Daxia Ge , Christos Kozyrakis , Kunle Olukotun, ATLAS: a chip-multiprocessor with transactional memory support, Proceedings of the conference on Design, automation and test in Europe, April 16-20, 2007, Nice, France
|
| |
12
|
NPB. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.
|
| |
13
|
|
| |
14
|
|
| |
15
|
NR Adiga , G Almasi , GS Almasi , Y Aridor , R Barik , D Beece , R Bellofatto , G Bhanot , R Bickford , M Blumrich , AA Bright , J Brunheroto , C Caşcaval , J Castaños , W Chan , L Ceze , P Coteus , S Chatterjee , D Chen , G Chiu , TM Cipolla , P Crumley , KM Desai , A Deutsch , T Domany , MB Dombrowa , W Donath , M Eleftheriou , C Erway , J Esch , B Fitch , J Gagliano , A Gara , R Garg , R Germain , ME Giampapa , B Gopalsamy , J Gunnels , M Gupta , F Gustavson , S Hall , RA Haring , D Heidel , P Heidelberger , LM Herger , D Hoenicke , RD Jackson , T Jamal-Eddine , GV Kopcsay , E Krevat , MP Kurhekar , AP Lanzetta , D Lieber , LK Liu , M Lu , M Mendell , A Misra , Y Moatti , L Mok , JE Moreira , BJ Nathanson , M Newton , M Ohmacht , A Oliner , V Pandit , RB Pudota , R Rand , R Regan , B Rubin , A Ruehli , S Rus , RK Sahoo , A Sanomiya , E Schenfeld , M Sharma , E Shmueli , S Singh , P Song , V Srinivasan , BD Steinmacher-Burow , K Strauss , C Surovic , R Swetz , T Takken , RB Tremaine , M Tsao , AR Umamaheshwaran , P Verma , P Vranas , TJC Ward , M Wazlowski , W Barrett , C Engel , B Drehmel , B Hilgart , D Hill , F Kasemkhani , D Krolak , CT Li , T Liebsch , J Marcella , A Muff , A Okomo , M Rouse , A Schram , M Tubbs , G Ulsh , C Wait , J Wittrup , M Bae , K Dockser , L Kissel , MK Seager , JS Vetter , K Yates, An overview of the BlueGene/L Supercomputer, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-22, November 16, 2002, Baltimore, Maryland
|
 |
16
|
Sewook Wee , Jared Casper , Njuguna Njoroge , Yuriy Tesylar , Daxia Ge , Christos Kozyrakis , Kunle Olukotun, A practical FPGA-based framework for novel CMP research, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216936]
|
 |
17
|
|
|