|
ABSTRACT
In this paper, we survey the design space of a new class of architectures called Grid Processor Architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventional architectures while providing superior instruction-level parallelism on traditional workloads and high performance across a range of application classes. A GPA consists of an array of ALUs, each with limited control, connected by a thin operand network. Programs are executed by mapping blocks of statically scheduled instructions to the ALU array and executing them dynamically in dataflow order. This organization enables the critical paths of instruction blocks to be executed on chains of ALUs without transmitting temporary values back to the register file, avoiding most of the large, unscalable structures that limit the scalability of conventional architectures. Finally, we present simulation results of a preliminary design, the GPA-1. With a half-cycle routing delay, we obtain performance roughly equal to an ideal 8-way, 512-entry window superscalar core. With no inter-ALU delay, perfect memory, and perfect branch prediction, the IPC of the GPA-1 is more than twice that of the ideal superscalar core, achieving an average of 11 IPC across nine SPEC CPU2000 and Mediabench benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
| |
2
|
|
 |
3
|
David I. August , Daniel A. Connors , Scott A. Mahlke , John W. Sias , Kevin M. Crozier , Ben-Chung Cheng , Patrick R. Eaton , Qudus B. Olaniran , Wen-mei W. Hwu, Integrated predicated and speculative execution in the IMPACT EPIC architecture, Proceedings of the 25th annual international symposium on Computer architecture, p.227-237, June 27-July 02, 1998, Barcelona, Spain
|
| |
4
|
D. Burger and T. M. Austin. The SimpleScalar tool set version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin, June 1997.
|
| |
5
|
H. Corporaal. Transport Triggered Architectures. PhD thesis, Delft University of Technology, September 1995.
|
 |
6
|
|
 |
7
|
David E. Culler , Anurag Sah , Klaus E. Schauser , Thorsten von Eicken , John Wawrzynek, Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.164-175, April 08-11, 1991, Santa Clara, California, United States
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
Eric Hao , Po-Yung Chang , Marius Evers , Yale N. Patt, Increasing the instruction fetch rate via block-structured instruction set architectures, Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture, p.191-200, December 02-04, 1996, Paris, France
|
| |
12
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor, Intel Technology Journal QI, 2001.
|
| |
13
|
|
| |
14
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
15
|
|
 |
16
|
Scott A. Mahlke , David C. Lin , William Y. Chen , Richard E. Hank , Roger A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proceedings of the 25th annual international symposium on Microarchitecture, p.45-54, December 01-04, 1992, Portland, Oregon, United States
|
 |
17
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
18
|
L. Peh and W. J. Dally. Flit-reservation flow control. In Proceedings of 6th International Symposium on High-Performance Computer Architecture, pages 73-84, January 2000.
|
 |
19
|
|
| |
20
|
|
| |
21
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
22
|
The national technology roadmap for semiconductors. Semiconductor Industry Association, 1999.
|
| |
23
|
|
 |
24
|
|
| |
25
|
M. B. Taylor, J. Kim, J. Miller, F. Ghodrat, B. Greenwald, P. Johnson, W. Lee, A. Ma, N. Shnidman, V. Strumpen, D. Wentzlaff, M. Frank, S. Amarasinghe, and A. Agarwal. The Raw processor - a scalable 32-bit fabric for embedded and general purpose computing. In Proceedings of Hot Chips XIII, August 2001.
|
| |
26
|
Trimaran : An infrastructure for research in instruction-level parallelism, http://www.trimaran.org.
|
| |
27
|
A. K. Uht, D. Morano, A. Khalafi, M. de Alba, T. Wenisch, M. Ashouei, and D. Kaeli. IPC in the 10's via resource flow computing with Levo. Technical Report 092001-001, Department of Electrical and Computer Engineering, University of Rhode Island, Kingston, RI, September 2001.
|
 |
28
|
|
| |
29
|
Elliot Waingold , Michael Taylor , Devabhaktuni Srikrishna , Vivek Sarkar , Walter Lee , Victor Lee , Jang Kim , Matthew Frank , Peter Finch , Rajeev Barua , Jonathan Babb , Saman Amarasinghe , Anant Agarwal, Baring It All to Software: Raw Machines, Computer, v.30 n.9, p.86-93, September 1997
[doi> 10.1109/2.612254
]
|
CITED BY 34
|
|
|
|
Steven Swanson , Andrew Putnam , Martha Mercaldi , Ken Michelson , Andrew Petersen , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Area-Performance Trade-offs in Tiled Dataflow Architectures, ACM SIGARCH Computer Architecture News, v.34 n.2, p.314-326, May 2006
|
|
Andrew Petersen , Andrew Putnam , Martha Mercaldi , Andrew Schwerin , Susan Eggers , Steve Swanson , Mark Oskin, Reducing control overhead in dataflow architectures, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Robert McDonald , Rajagopalan Desikan , Saurabh Drolia , M. S. Govindan , Paul Gratz , Divya Gulati , Heather Hanson , Changkyu Kim , Haiming Liu , Nitya Ranganathan , Simha Sethumadhavan , Sadia Sharif , Premkishore Shivakumar , Stephen W. Keckler , Doug Burger, Distributed Microarchitectural Protocols in the TRIPS Prototype Processor, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.480-491, December 09-13, 2006
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, ACM SIGARCH Computer Architecture News, v.31 n.2, May 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Martha Mercaldi , Steven Swanson , Andrew Petersen , Andrew Putnam , Andrew Schwerin , Mark Oskin , Susan J. Eggers, Instruction scheduling for a tiled dataflow architecture, ACM SIGOPS Operating Systems Review, v.40 n.5, December 2006
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Nitya Ranganathan , Doug Burger , Stephen W. Keckler , Robert G. McDonald , Charles R. Moore, TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP, ACM Transactions on Architecture and Code Optimization (TACO), v.1 n.1, p.62-93, March 2004
|
|
|
Aaron Smith , Jon Gibson , Bertrand Maher , Nick Nethercote , Bill Yoder , Doug Burger , Kathryn S. McKinle , Jim Burrill, Compiling for EDGE Architectures, Proceedings of the International Symposium on Code Generation and Optimization, p.185-195, March 26-29, 2006
|
|
|
|
|
|
|
John Oliver , Ravishankar Rao , Paul Sultana , Jedidiah Crandall , Erik Czernikowski , Leslie W. Jones IV , Diana Franklin , Venkatesh Akella , Frederic T. Chong, Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor, ACM SIGARCH Computer Architecture News, v.32 n.2, p.150, March 2004
|
|
|
|
|
|
|
|
Michael Bedford Taylor , Walter Lee , Jason Miller , David Wentzlaff , Ian Bratt , Ben Greenwald , Henry Hoffmann , Paul Johnson , Jason Kim , James Psota , Arvind Saraf , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams, ACM SIGARCH Computer Architecture News, v.32 n.2, p.2, March 2004
|
|
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|
|
|
|
|