Article

A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors

Authors:

Adrián Cristal,

José F. Martínez,

Mateo ValeroAuthors Info & Claims

MEDEA '03: Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications, systems and architecture

Pages 3 - 10

https://doi.org/10.1145/1152923.1024296

Published: 27 September 2003 Publication History

Abstract

Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light of the increasing gap between processor speed and memory latency, tolerating upcoming latencies in this way would require impractical sizes of such critical resources.To tackle this scalability problem, we make a case for resource-conscious out-of-order processors. We present quantitative evidence that critical resources are increasingly underutilized in these processors. We advocate that better use of such resources should be a priority in future research in processor architectures. In particular, we present some of our research having such observations as a basis to deal with future resource conscious processors.

References

[1]

E. Brekelbaum, J. Rupley II, C. Wilkerson, and B. Black. Hierarchical instruction windows. In Intl, Symp. on Microarchitecture, Nov. 2002

Digital Library

[2]

M. Brown, J. Stark, and Y. Patt. Select-free instruction scheduling logic. In Intl. Symp. on Microarchitecture Nov. 2001

Digital Library

[3]

A. Cristal, D. Ortega, J. Llosa, M. Valero. Kilo-instruction processors. in Lecture Notes in Computer Science (LNCS) 2858, Oct. 2003. Invited paper to ISHPC V.

[4]

A. Cristal, D. Ortega, J. Llosa and M. Valero. Out-of-order commit processors. To appear in In Intl. Symp. on High-Performance Computer Architecture, Feb 2004

Digital Library

[5]

A. Cristal, M. Valero, J. Llosa, and A. González. Large virtual ROBs by processor checkpointing. Tech. Rep. UPC-DAC-2002-39, Universitat Politécnica de Catalunya, July 2002. This paper was submitted to MICRO 35.

[6]

A. Cristal, J. F. Martínez, J. Llosa and M. Valero. Ephemeral Registers with Multicheckpointing. Tech. Rep. UPC-DAC-2003-51. Universitat Politécnica de Catalunya, Nov 2003.

[7]

Monreal, T.; Gonzalez, A.; Valero, M.; Gonzalez, J.; Vinals, V. Delaying physical register allocation through virtual-physical registers. In Intl. Symp. on Microarchitecture. Nov 1999.

Digital Library

[8]

W. W. Hwu and Y. N. Patt. Checkpoint repair for out-of-order execution machines. In Intl. Symp. on Computer Architecture, June 1987

Digital Library

[9]

T. Karkhanis and J. E. Smith. A day in the life of a data cache miss. In Wkshp. on Memory Performance Issues, in conjunction with Intl. Symp. on Computer Architecture, July 2002

[10]

A. R. Lebeck, J. Koppanalil, T. Li, and J. Patwardhan, and Eric Rotenberg. A large, fast instruction window for tolerating cache misses. In Intl. Symp. on Computer Architecture, June 2002

Digital Library

[11]

J. F. Martínez, A. Cristal, M. Valero, and J. Llosa. Ephemeral registers. Tech. Rep. CSL-TR-2003-1035, Computer Systems Lab, Cornell University, June 2003

[12]

J. F. Martínez, J. Renau, M. C. Huang, M. Prvulovic, and J. Torrellas. Cherry: Checkpointed early resource recycling in out-of-order microprocessors. In Intl. Symp. on Microarchitecture, Nov. 2002

Digital Library

[13]

E. Morancho, J. Llabería, and A. Olivé. Recovery mechanism for latency misprediction. Technical Report UPC-DAC-2001-37, Nov. 2001.

[14]

M. Moudgill, K. Pingali, and S. Vassiliadis. Register renaming and dynamic speculation: An alternative approach. In Intl. Symp. on Microarchitecture, Dec. 1993

Digital Library

[15]

O. Mutlu, J. Stark, C. Wilkerson, and Y. Patt. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Intl. Symp. on High-Performance Computer Architecture, Feb. 2003

Digital Library

[16]

S. Palacharla, N. P. Jouppi, and J. E. Smith. Complexity-effective superscalar processors. In Intl. Symp. on Computer Architecture, June 1997

Digital Library

[17]

S. E. Raasch, N. L. Binkert, and S. K. Reinhardt. A scalable instruction queue design using dependence chains. In Intl. Symp. on Computer Architecture, June 2002.

Digital Library

[18]

J. Stark, M. Brown, and Y. Patt. On pipelining dynamic instruction scheduling logic. In Intl. Symp. on Microarchitecture December 2000.

Digital Library

Cited By

Golander AWeiss S(2008)Hiding the misprediction penalty of a resource-efficient high-performance processorACM Transactions on Architecture and Code Optimization10.1145/1328195.13282014:4(1-32)Online publication date: 30-Jan-2008
https://dl.acm.org/doi/10.1145/1328195.1328201
Galluzzi MVallejo ECristal AVallejo FBeivide RStenström PSmith JValero M(2007)Implicit Transactional Memory in Kilo-Instruction MultiprocessorsAdvances in Computer Systems Architecture10.1007/978-3-540-74309-5_32(339-353)Online publication date: 2007
https://doi.org/10.1007/978-3-540-74309-5_32
Akkary HRajwar RSrinivasan S(2004)An analysis of a resource efficient checkpoint architectureACM Transactions on Architecture and Code Optimization10.1145/1044823.10448261:4(418-444)Online publication date: 1-Dec-2004
https://dl.acm.org/doi/10.1145/1044823.1044826
Show More Cited By

Recommendations

A case for resource-conscious out-of-order processors

Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light ...
A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors
Special issue: MEDEA-2003 workshop

Modern out-of-order processors tolerate long-latency memory operations by supporting a large number of in-flight instructions. This is achieved in part through proper sizing of critical resources, such as register files or instruction queues. In light ...
DLL-conscious instruction fetch optimization for SMT processors

Simultaneous multithreading (SMT) processors can issue multiple instructions from distinct processes or threads in the same cycle. This technique effectively increases the overall throughput by keeping the pipeline resources more occupied at the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MEDEA '03: Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture

September 2003

75 pages

ISBN:9781450378208

DOI:10.1145/1152923

Conference Chairs:
Sandro Bartolini
University of Siena, Italy
,
Pierfrancesco Foglia
University of Pisa, Italy
,
Roberto Giorgi
University of Siena, Italy
,
Cosimo Antonio Prete
University of Pisa, Italy

ACM SIGARCH Computer Architecture News Volume 32, Issue 3
Special issue: MEDEA-2003 workshop
June 2004
81 pages
ISSN:0163-5964
DOI:10.1145/1024295
Issue’s Table of Contents

Copyright © 2003 Copyright is held by the owner/author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2003

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 6 of 9 submissions, 67%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
318
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Golander AWeiss S(2008)Hiding the misprediction penalty of a resource-efficient high-performance processorACM Transactions on Architecture and Code Optimization10.1145/1328195.13282014:4(1-32)Online publication date: 30-Jan-2008
https://dl.acm.org/doi/10.1145/1328195.1328201
Galluzzi MVallejo ECristal AVallejo FBeivide RStenström PSmith JValero M(2007)Implicit Transactional Memory in Kilo-Instruction MultiprocessorsAdvances in Computer Systems Architecture10.1007/978-3-540-74309-5_32(339-353)Online publication date: 2007
https://doi.org/10.1007/978-3-540-74309-5_32
Akkary HRajwar RSrinivasan S(2004)An analysis of a resource efficient checkpoint architectureACM Transactions on Architecture and Code Optimization10.1145/1044823.10448261:4(418-444)Online publication date: 1-Dec-2004
https://dl.acm.org/doi/10.1145/1044823.1044826
Cristal ASantana OValero MMartínez J(2004)Toward kilo-instruction processorsACM Transactions on Architecture and Code Optimization10.1145/1044823.10448251:4(389-417)Online publication date: 1-Dec-2004
https://dl.acm.org/doi/10.1145/1044823.1044825
Cristal AOrtega DLlosa JValero M(2004)Out-of-Order Commit ProcessorsProceedings of the 10th International Symposium on High Performance Computer Architecture10.1109/HPCA.2004.10008Online publication date: 14-Feb-2004
https://dl.acm.org/doi/10.1109/HPCA.2004.10008
Cristal AOrtega DLlosa JValero M(2003)Kilo-instruction ProcessorsHigh Performance Computing10.1007/978-3-540-39707-6_2(10-25)Online publication date: 2003
https://doi.org/10.1007/978-3-540-39707-6_2
Pericas MCristal ACazorla FGonzalez RJimenez DValero M(2007)A Flexible Heterogeneous Multi-Core Architecture16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)10.1109/PACT.2007.4336196(13-24)Online publication date: Sep-2007
https://doi.org/10.1109/PACT.2007.4336196
Pericas MCristal AGonzalez RJimenez DValero M(2006)A Decoupled KILO-Instruction ProcessorThe Twelfth International Symposium on High-Performance Computer Architecture, 2006.10.1109/HPCA.2006.1598112(52-63)Online publication date: 2006
https://doi.org/10.1109/HPCA.2006.1598112
Sethumadhavan SDesikan RBurger DMoore CKeckler S(2004)Scalable Hardware Memory Disambiguation for High-ILP ProcessorsIEEE Micro10.1109/MM.2004.8724:6(118-127)Online publication date: 1-Nov-2004
https://dl.acm.org/doi/10.1109/MM.2004.87

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten