|
ABSTRACT
Chip multiprocessing has become an exciting new direction for system designers to deliver increased performance by exploiting CMOS scaling. We discuss key design decisions facing the system architect of a chip multiprocessor and describe how these choices were made in the design of the Cell Broadband Engine.An important decision is whether to base system performance on thread-level parallelism alone, or to complement thread-level parallelism with other forms of parallelism. Depending on workload characteristics, providing parallelism at the processor core level may increase overall system efficiency.Parallelism is also a key to utilize available memory bandwidth more efficiently, by overlapping and interleaving multiple accesses to system memory. By interleaving the access streams of multiple threads, memory level parallelism can be increased to allow better memory interface utilization. In addition, compute-transfer parallelism (CTP) offers a new form of parallelism to initiate memory transfers under software control without stalling the requesting thread.We describe how the Cell Broadband Enginetmuses parallelism at all levels of the system abstraction to deliver a quantum leap in application performance, and how the Cell Synergistic Memory Flow engine exploits compute-transfer level parallelism by providing efficient block transfer capabilities.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
F. Allen , G. Almasi , W. Andreoni , D. Beece , B. J. Berne , A. Bright , J. Brunheroto , C. Cascaval , J. Castanos , P. Coteus , P. Crumley , A. Curioni , M. Denneau , W. Donath , M. Eleftheriou , B. Fitch , B. Fleischer , C. J. Georgiou , R. Germain , M. Giampapa , D. Gresh , M. Gupta , R. Haring , H. Ho , P. Hochschild , S. Hummel , T. Jonas , D. Lieber , G. Martyna , K. Maturu , J. Moreira , D. Newns , M. Newton , R. Philhower , T. Picunko , J. Pitera , M. Pitman , R. Rand , A. Royyuru , V. Salapura , A. Sanomiya , R. Shah , Y. Sham , S. Singh , M. Snir , F. Suits , R. Swetz , W. C. Swope , N. Vishnumurthy , T. J. C. Ward , H. Warren , R. Zhou, Blue Gene: a vision for protein science using a petaflop supercomputer, IBM Systems Journal, v.40 n.2, p.310-327, February 2001
|
 |
2
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
3
|
|
 |
4
|
|
| |
5
|
Scott Clark, Kent Haselhorst, Kerry Imming, John Irish, Dave Krolak, and Tolga Ozguner. Cell Broadband Engineinterconnect and memory interface. In Hot Chips 17, Palo Alto, CA, August 2005.
|
| |
6
|
Cliff Click. A tour inside the Azul384-way Javaappliance. Tutorial at the 14th International Conference on Parallel Architectures and Compilation Techniques, September 2005.
|
| |
7
|
Robert Dennard. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, SC-9:256--268, 1974.
|
| |
8
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
| |
9
|
Brian Flachs, S. Asano, S. Dhong, P. Hofstee, G. Gervais, R. Kim, T. Le, P. Liu, J. Leenstra, J. Liberty, B. Michael, H.-J. Oh, S. Mueller, O. Takahashi, A. Hatakeyama, Y. Watanabe, N. Yano, D. Brokenshire, M. Peyravian, V. To, and E. Iwata. The microarchitecture of the Synergistic Processorfor a Cell processor. IEEE Journal of Solid-State Circuits, 41(1), January 2006.
|
| |
10
|
Andrew Glew. MLPyes! ILPno! In ASPLOS Wild and Crazy Idea Session '98, October 1998.
|
| |
11
|
Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.
|
| |
12
|
Michael Gschwind, Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. A novel SIMDarchitecture for the CELLheterogeneous chip multiprocessor. In IEEE Micro, March 2006.
|
| |
13
|
Peter Hofstee. Introduction to the Cell Broadband Engine. Technical report, IBM Corp., 2005.
|
| |
14
|
|
| |
15
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
16
|
Tejas Karkhanis and James E. Smith. A day in the life of a data cache miss. In Workshop on Memory Performance Issues, 2002.
|
 |
17
|
Valentina Salapura , Randy Bickford , Matthias Blumrich , Arthur A. Bright , Dong Chen , Paul Coteus , Alan Gara , Mark Giampapa , Michael Gschwind , Manish Gupta , Shawn Hall , Ruud A. Haring , Philip Heidelberger , Dirk Hoenicke , Gerard V. Kopcsay , Martin Ohmacht , Rick A. Rand , Todd Takken , Pavlos Vranas, Power and performance optimization at the system level, Proceedings of the 2nd conference on Computing frontiers, p.125-132, May 04-06, 2005, Ischia, Italy
[doi> 10.1145/1062261.1062262]
|
| |
18
|
Viji Srinivasan , David Brooks , Michael Gschwind , Pradip Bose , Victor Zyuban , Philip N. Strenski , Philip G. Emma, Optimizing pipelines for power and performance, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
 |
19
|
|
CITED BY 7
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
Edward K. Walters II , J. Eliot B. Moss , Trek Palmer , Timothy Richards , Charles C. Weems, CASL: A rapid-prototyping language for modern micro-architectures, Computer Languages, Systems and Structures, v.34 n.4, p.195-211, December, 2008
|
|
Mikhail Smelyanskiy , Victor W Lee , Daehyun Kim , Anthony D Nguyen , Pradeep Dubey, Scaling performance of interior-point method on large-scale chip multiprocessor system, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
Sanjeev Kumar , Daehyun Kim , Mikhail Smelyanskiy , Yen-Kuang Chen , Jatin Chhugani , Christopher J. Hughes , Changkyu Kim , Victor W. Lee , Anthony D. Nguyen, Atomic Vector Operations on Chip Multiprocessors, ACM SIGARCH Computer Architecture News, v.36 n.3, p.441-452, June 2008
|
|
|
|
|