skip to main content
10.1145/782814.782840acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Predicate prediction for efficient out-of-order execution

Published: 23 June 2003 Publication History

Abstract

Predicated execution is an important optimization even for an out-of-order processor, since it can eliminate hard to predict branches and help to enable software pipelining. Using predication with out-of-order execution creates a naming bottleneck, because there can be multiple definitions reaching a use, and not knowing which use is the correct one can stall the processor.In this paper, we examine using predicate prediction to speculatively allow execution to proceed in the face of multiple definitions. We show that the penalty for mispredicting a predicate is not as severe as mispredicting a branch. Thus, making it advantageous to replace hard to predict branches with predicate predictions. We present a predicate misprediction recovery architecture that replays instructions through the renamer to link up the correct dependencies on a misprediction. This approach allows us to avoid putting the predicted false path instructions in the issue queue reducing the pressure on the dynamic out-of-order scheduler.

References

[1]
E. Borsch, E. Tune, S. Manne, and J. Emer. Loose loops sink chips. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture, Feb 2002.
[2]
D.C. Burger and T.M. Austin. The Simplescalar Tool Set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, Jun 1997.
[3]
P.Y. Chang, E. Hao, Y. Patt, and P.P. Chang. Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution. In Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques, June 1995.
[4]
W. Chuang, B. Calder, and J. Ferrante. Phi-prediction for light-weight if-conversion. In Proceedings of the International Symposium on Code Generation and Optimization, March 2003.
[5]
S. Eranian and D. Mosberger. The Linux/ia64 Project: Kernel Design and Status Update. Technical Report HPL-2000-85, HP Labs, 2000.
[6]
D. Ernst, A. Hamel, and T. Austin. Cyclone: A broadcast-free dynamic instruction scheduler selective replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, June 2003.
[7]
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the pentium 4 processor. Intel Technology Journal Q1, 2001.
[8]
Intel Itanium Processor Reference Manual for Software Optimization, November 2001. http://developer.intel.com/design/itanium/downloads/245474.htm.
[9]
Intel Flexible Annotations. http://www.intel.com/software/products/opensource/tools1/perftools.htm.
[10]
IA-64 Application Instruction Set Architecture Guide, Revision 1.0, 1999.
[11]
R. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, 19(2):24--36, Mar--Apr 1991.
[12]
A. Klauser, T. Austin, D. Grunwald, and B. Calder. Dynamic hammock predication for non-predicated instruction set architectures. In Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, pages 278--285, October 1998.
[13]
M. Lipasti and J. P. Shen. Exceeding the dataflow limit via value prediction. In Proceedings of the 29th International Symposium on Microarchitecture, Dec 1996.
[14]
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, May 1993.
[15]
S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, and W. W. Hwu. A comparison of full and partial predicated execution support for ILP processors. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 138--150, 1995.
[16]
S. McFarling. Combining Branch Predictors. Technical Report TN-36, Compaq WRL, June 1993.
[17]
M. Schlansker and B. R. Rau. EPIC: An Architecture for Instruction-Level Parallel Procesors. Technical Report HPL-1999-111, HP Labs, 2000.
[18]
H. Sharangpani and K. Aurora. Itanium processor microarchitecture. IEEE Micro, 20(5):24--43, Sept--Oct 2000.
[19]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Languages and Operating Systems, October 2002. http://www.cse.ucsd.edu/users/calder/simpoint/.
[20]
P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming for dynamic execution of predicated code. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, February 2001.
[21]
T. Y. Yeh and Y. N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In Proceedings of the 20th International Symposium on Computer Architecture. ACM and IEEE Computer Society, 1993.

Cited By

View all

Index Terms

  1. Predicate prediction for efficient out-of-order execution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICS '03: Proceedings of the 17th annual international conference on Supercomputing
    June 2003
    380 pages
    ISBN:1581137338
    DOI:10.1145/782814
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. predicate prediction
    2. predicated execution

    Qualifiers

    • Article

    Conference

    ICS03
    Sponsor:
    ICS03: International Conference on Supercomputing 2003
    June 23 - 26, 2003
    CA, San Francisco, USA

    Acceptance Rates

    ICS '03 Paper Acceptance Rate 36 of 171 submissions, 21%;
    Overall Acceptance Rate 629 of 2,180 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Pipelining a triggered processing elementProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3124551(96-108)Online publication date: 14-Oct-2017
    • (2014)Efficient Out-of-Order Execution of Guarded ISAsACM Transactions on Architecture and Code Optimization10.1145/267703711:4(1-21)Online publication date: 8-Dec-2014
    • (2013)Low level conditional move optimizationActa Cybernetica10.14232/actacyb.21.1.2013.221:1(5-20)Online publication date: 1-Jan-2013
    • (2013)Using condition flag prediction to improve the performance of out-of-order processors2013 IEEE International Symposium on Circuits and Systems (ISCAS2013)10.1109/ISCAS.2013.6572077(1240-1243)Online publication date: May-2013
    • (2013)How to implement effective prediction and forwarding for fusable dynamic multicore architecturesProceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2013.6522341(460-471)Online publication date: 23-Feb-2013
    • (2013)An Initial Investigation of a Multi-layered Approach for Optimizing and Parallelizing Real-Time Media and Audio ApplicationsProceedings of the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing10.1109/3PGCIC.2013.82(479-484)Online publication date: 28-Oct-2013
    • (2010)Branch PredicationSpeculative Execution in High Performance Computer Architectures10.1201/9781420035155.ch5(109-133)Online publication date: 14-Jan-2010
    • (2009)Lightweight predication support for out of order processors2009 IEEE 15th International Symposium on High Performance Computer Architecture10.1109/HPCA.2009.4798255(201-212)Online publication date: Feb-2009
    • (2007)GingerACM SIGARCH Computer Architecture News10.1145/1273440.125071635:2(436-447)Online publication date: 9-Jun-2007
    • (2007)GingerProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250716(436-447)Online publication date: 9-Jun-2007
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media