An empirical study of decentralized ILP execution models

Authors:
Narayan Ranganathan

System Validation, 2501 NW 229th, RA2-302, Intel Corporation, Hillsboro, OR

System Validation, 2501 NW 229th, RA2-302, Intel Corporation, Hillsboro, OR
View Profile

,
Manoj Franklin

Electrical Engineering Department, University of Maryland, College Park, MD

Electrical Engineering Department, University of Maryland, College Park, MD
View Profile

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systemsOctober 1998Pages 272–281https://doi.org/10.1145/291069.291061

Published:01 October 1998Publication History

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

Pages 272–281

ABSTRACT

Recent fascination for dynamic scheduling as a means for exploiting instruction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overcome the scalability problems of centralized hardware schedulers, many decentralized execution models are being proposed and investigated recently. The crux of all these models is to split the instruction window across multiple processing elements (PEs) that do independent, scheduling of instructions. The decentralized execution models proposed so far can be grouped under 3 categories, based on the criterion used for assigning an instruction to a particular PE. They are: (i) execution unit dependence based decentralization (EDD), (ii) control dependence based decentralization (CDD), and (iii) data dependence based decentralization (DDD). This paper investigates the performance aspects of these three decentralization approaches. Using a suite of important benchmarks and realistic system parameters, we examine performance differences resulting from the type of partitioning as well as from specific implementation issues such as the type of PE interconnect.We found that with a ring-type PE interconnect, the DDD approach performs the best when the number of PEs is moderate, and that the CDD approach performs best when the number of PEs is large. The currently used approach---EDD---does not perform well for any configuration. With a realistic crossbar, performance does not increase with the number of PEs for any of the partitioning approaches. The results give insight into the best way to use the transistor budget available for implementing the instruction window.

References

1.S. Dutta and M. Franklin, "Control Flow Prediction with Tree-like Subgraphs for Superscalax Processors," Proc. ~8th International Symposium on Microarchitecture (MICRO-28), pp. 258-263, 1995. Google ScholarDigital Library
2.P. G. Emma, "Understanding Some Simple Processor- Performance Limits," IBM Journal of Research and Development, Vol. 41, No. 3, May 1997. Google ScholarDigital Library
3.K. I. Farkas, P. Chow, N. P. Jouppi, and Z. Vranesic, "The Multicluster Architecture: Reducing Cycle Time Through Partitioning," Proc. 30th International Symposium on Microarchitecture (MICRO-30), pp. 149-159, 1997. Google ScholarDigital Library
4.M. Franklin, "The Multiscalar Architecture," Ph.D. Thesis, Technical Report TR 1196, Computer Sciences Department, University of Wisconsin-Madison, 1993. Google ScholarDigital Library
5.M. Franklin and G. S. Sohi, "Register Traffic Analysis for Streamlining Inter-Operation Communication in Fine-Grain Parallel Processors," Proc. 25th Annual International Symposium on Microarchitecture (MICRO- 25), pp. 236-245, 1992. Google ScholarDigital Library
6.G. A. Kemp and M. Franklin, "PEWs: A Decentralized Dynamic Scheduler for ILP Processing," Proc. International Conference on Parallel Processing (ICPP), Vol. I, pp. 239-246, 1996.Google ScholarCross Ref
7.D. Leibholz and R. Razdan, "The Alpha 21264: A 500 MHz Out-of-Order Execution Microprocessor," Proc. Compcon, pp. 28-36, 1997. Google ScholarDigital Library
8.S. W. Melvin and Y. N. Patt, "Exploiting Fine Grained Parallelism Through a Combination of Hardware and Software Techniques," Proceedings of the 18th Annual International Symposium on Computer Architecture, pp. 287-296, 1991. Google ScholarDigital Library
9.R. Nair and M. E. Hopkins, "Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups, Proc. 24th Annual International Symposium on Computer Architecture, 1997. Google ScholarDigital Library
10.S. Palacharla, N. P. Jouppi, and J. E. Smith, "Complexity-Effective Superscalar Processors," Proc. 24th Annual International Symposium on Computer Architecture, 1997. Google ScholarDigital Library
11.N. Ranganathan and M. Franklin, "Complexity~ Effective PEWs Microarchitecture," (to appear in) Microprocessors and Microsystems.Google Scholar
12.E. Rotenberg, S. Bennett, and J. E. Smith, "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching," Proc. 29th International Symposium on Microarchitecture (MICRO-29), pp. 24-34, 1996. Google ScholarDigital Library
13.E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. E. Smith, "Trace Processors," Proc. 30th International Symposium on Microarchitecture (MICRO-30), pp. 138-148, 1997. Google ScholarDigital Library
14.G. S. Sohi, S. E. Breach, and T. N. Vijaykumar, "Multiscalar Processors," Proc. 22nd International Symposium on Computer Architecture, pp. 414-425, 1995. Google ScholarDigital Library
15.K. K. Sundararaman and M. Franklin, "Multiscalar Execution along a Single Flow of Control," Proc. International Conference on Parallel Processing (ICPP), pp. 106-113, 1997. Google ScholarDigital Library
16.R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, pp. 25-33, January 1967.Google ScholarDigital Library
17.J-Y Tsai and P-C. Yew, "The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation," Proc. International Conference on Parallel Architectures and Compilation Techniques (PACT '96), pp. 35-46, 1996. Google ScholarDigital Library
18.G. Tyson, M. Fattens, and A. Pleszkun, "MISC: A Multiple Instruction Stream Computer," Proc. 25th Annual International Symposium on Microarchitecture (MICRO-25), pp. 193-196, 1992. Google ScholarDigital Library
19.S. Vajapeyam and T. Mitra, "Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences," Proc. 24th Annual International Symposium on Computer Architecture, 1997. Google ScholarDigital Library
20.K. C. Yeager, "The MIPS R10000 Superscalar Microprocessor," IEEE Micro, pp. 28-40, April 1996. Google ScholarDigital Library

Index Terms

An empirical study of decentralized ILP execution models

Recommendations

An empirical study of decentralized ILP execution models

Recent fascination for dynamic scheduling as a means for exploiting instruction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overcome the scalability problems of centralized ...
Read More
An empirical study of decentralized ILP execution models

Recent fascination for dynamic scheduling as a means for exploiting instruction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overcome the scalability problems of centralized ...
Read More
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings
ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of difficult-to-analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
October 1998
326 pages
ISBN:1581131070
DOI:10.1145/291069
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwal
Massachusetts Institute of Technology, Cambridge
ACM SIGPLAN Notices Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwel
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 32, Issue 5
Dec. 1998
309 pages
ISSN:0163-5980
DOI:10.1145/384265
Editor:
William M. Waite
Univ. of Colorado, Boulder
Issue’s Table of Contents
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1998
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
control dependence
data dependence
decentralization
dynamic scheduling
execution unit dependence
hardware window
instruction-level parallelism
speculative execution
Qualifiers
- Article
Conference

Acceptance Rates
ASPLOS VIII Paper Acceptance Rate28of123submissions,23%Overall Acceptance Rate535of2,713submissions,20%
More
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 626
  Total Downloads
- Downloads (Last 12 months)63
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An empirical study of decentralized ILP execution models

ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

An empirical study of decentralized ILP execution models

An empirical study of decentralized ILP execution models

On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings