|
ABSTRACT
Increased integration in the form of multiple processor cores on a single die, relatively constant die sizes, shrinking power envelopes, and emerging applications create a new challenge for processor architects. How to build a processor that provides high single-thread performance and enables multiple of these to be placed on the same die for high throughput while dynamically adapting for future applications? Conventional approaches for high single-thread performance rely on large and complex cores to sustain a large instruction window for memory tolerance, making them unsuitable for multi-core chips. We present Continual Flow Pipelines (CFP) as a new non-blocking processor pipeline architecture that achieves the performance of a large instruction window without requiring cycle-critical structures such as the scheduler and register file to be large. We show that to achieve benefits of a large instruction window, inefficiencies in management of both the scheduler and register file must be addressed, and we propose a unified solution. The non-blocking property of CFP keeps key processor structures affecting cycle time and power (scheduler, register file), and die size (second level cache) small. The memory latency-tolerant CFP core allows multiple cores on a single die while outperforming current processor cores for single-thread applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
|
 |
5
|
Doug Burger , Stefanos Kaxiras , James R. Goodman, DataScalar architectures, Proceedings of the 24th annual international symposium on Computer architecture, p.338-349, June 01-04, 1997, Denver, Colorado, United States
|
 |
6
|
Robert S. Chappell , Jared Stark , Sangwook P. Kim , Steven K. Reinhardt , Yale N. Patt, Simultaneous subordinate microthreading (SSMT), Proceedings of the 26th annual international symposium on Computer architecture, p.186-195, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
7
|
|
| |
8
|
|
| |
9
|
A. Cristal, M. Valero, J.-L. Llosa, and A. Gonzalez. Large Virtual ROBs by Processor Checkpointing. Technical Report UPC-DAC-2002-39, Universitat Politecnica de Catalunya, July 2002.
|
 |
10
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
 |
11
|
|
| |
12
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal, February 2001.
|
| |
13
|
T. Karkhanis and J. E. Smith. A Day in the Life of a Data Cache Miss. In Workshop on Memory Performance Issues, June 2002.
|
 |
14
|
Alvin R. Lebeck , Jinson Koppanalil , Tong Li , Jaidev Patwardhan , Eric Rotenberg, A large, fast instruction window for tolerating cache misses, Proceedings of the 29th annual international symposium on Computer architecture, p.59, May 25-29, 2002, Anchorage, Alaska
|
| |
15
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
16
|
T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, and V. Vinals. Dynamic Register Renaming Through Virtual-Physical Registers. In Journal of Instruction Level Parallelism, May 2000.
|
| |
17
|
Mayan Moudgill , Keshav Pingali , Stamatis Vassiliadis, Register renaming and dynamic speculation: an alternative approach, Proceedings of the 26th annual international symposium on Microarchitecture, p.202-213, December 01-03, 1993, Austin, Texas, United States
|
| |
18
|
|
| |
19
|
|
 |
20
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Haiming Liu , Changkyu Kim , Jaehyuk Huh , Doug Burger , Stephen W. Keckler , Charles R. Moore, Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture, Proceedings of the 30th annual international symposium on Computer architecture, June 09-11, 2003, San Diego, California
|
 |
21
|
|
| |
22
|
Y. Song and M. Dubois, Assisted Execution. University of Southern California, Technical Report #CENG 98-25, Department of EE-Systems, October 1998.
|
 |
23
|
|
CITED BY 30
|
|
|
|
|
Adrian Cristal , Oliverio J. Santana , Francisco Cazorla , Marco Galluzzi , Tanausu Ramirez , Miquel Pericas , Mateo Valero, Kilo-Instruction Processors: Overcoming the Memory Wall, IEEE Micro, v.25 n.3, p.48-57, May 2005
|
|
|
|
|
|
|
|
|
|
|
Tanausú Ramírez , Alex Pajuelo , Oliverio J. Santana , Mateo Valero, A simple speculative load control mechanism for energy saving, Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures, p.29-36, September 16-20, 2006, Seattle, Washington
|
|
|
|
|
|
Karthikeyan Sankaralingam , Ramadass Nagarajan , Robert McDonald , Rajagopalan Desikan , Saurabh Drolia , M. S. Govindan , Paul Gratz , Divya Gulati , Heather Hanson , Changkyu Kim , Haiming Liu , Nitya Ranganathan , Simha Sethumadhavan , Sadia Sharif , Premkishore Shivakumar , Stephen W. Keckler , Doug Burger, Distributed Microarchitectural Protocols in the TRIPS Prototype Processor, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.480-491, December 09-13, 2006
|
|
|
|
|
Akihiro Yamamoto , Yusuke Tanaka , Hideki Ando , Toshio Shimada, Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.33-40, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tanausú Ramírez , Alex Pajuelo , Oliverio J. Santana , Mateo Valero, Kilo-instruction processors, runahead and prefetching, Proceedings of the 3rd conference on Computing frontiers, May 03-05, 2006, Ischia, Italy
|
|
|
|
|
|
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
Joseph J. Sharkey , Dmitry V. Ponomarev , Kanad Ghose , Oguz Ergin, Instruction packing: reducing power and delay of the dynamic scheduling logic, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|