research-article

Free Access

Optimal trace scheduling using enumeration

Authors:
Ghassan Shobaki

University of California, Davis and Sun Microsystems

University of California, Davis and Sun Microsystems
View Profile

,
Kent Wilken

University of California, Davis

University of California, Davis
View Profile

,
Mark Heffernan

University of California, Davis

University of California, Davis
View Profile

ACM Transactions on Architecture and Code Optimization Volume 5 Issue 4Article No.: 19pp 1–32https://doi.org/10.1145/1498690.1498694

Published:23 March 2009Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

This article presents the first optimal algorithm for trace scheduling. The trace is a global scheduling region used by compilers to exploit instruction-level parallelism across basic block boundaries. Several heuristic techniques have been proposed for trace scheduling, but the precision of these techniques has not been studied relative to optimality. This article describes a technique for finding provably optimal trace schedules, where optimality is defined in terms of a weighted sum of schedule lengths across all code paths in a trace. The optimal algorithm uses branch-and-bound enumeration to efficiently explore the entire solution space. Experimental evaluation of the algorithm shows that, with a time limit of 1 s per problem, 91% of the hard trace scheduling problems in the SPEC CPU 2006 Integer Benchmarks are solved optimally. For 58% of these hard problems, the optimal schedule is improved compared to that produced by a heuristic scheduler with a geometric mean improvement of 3.2% in weighted schedule length and 18% in compensation code size.

References

Bernstein, D. and Rodeh, M. 1991. Global scheduling for super-scalar machines. In Proceedings of the Conference on Programming Language Design and Implementation. ACM, New York. Google ScholarDigital Library
Bharadwaj, J., Menezes, K., and Mckinsey, C. 2000. Wavefront scheduling: path based data representation and scheduling of subgraphs. J. Instruc.-Level Paral., 2, 7.Google Scholar
Chekuri, C., Johnson, R., Motwani, R., Natarajan, B., Rau, B., and Schlansker, M. 1996. Profile-driven instruction-level parallel scheduling with applications to superblocks. In Proceedings of the 29th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA. Google ScholarDigital Library
Chou, H. and Chung, C. 1995. An optimal instruction scheduler for superscalar processor. IEEE Trans. Parall. Distrib. Syst. 6, 303--313. Google ScholarDigital Library
Fisher, J. 1981. Trace Scheduling: a technique for global micro-code compaction. IEEE Trans. Comput. 30, 478--490. Google ScholarDigital Library
Freudenberger, S., Gross, T., and Lowney, P. 1994. Avoidance and suppression of compensation code in a trace scheduling compiler. ACM Trans. Program. Lang. Syst. 16, 1156--1214. Google ScholarDigital Library
Havanki, W., Banerjia, S., and Conte, T., 1998. Treegion scheduling for wide-issue processors. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture (HPCA-4), IEEE, Los Alamitos, CA. Google ScholarDigital Library
Hennessy, J. and Gross, T. 1983. Post-pass code optimization of pipeline constraints. ACM Trans. Program. Lang. Syst. 5, 422--448. Google ScholarDigital Library
Hwu, W., Mahlke, S., Chen, W., Chang, P., Warter, N., Bringmann, R., Ouellette, R., Hank, R., Kiyohara, T., Haab, G., Holm J., and Lavery, D. 1993. The superblock: an effective technique for VLIW and superscalar compilation. J. Supercomput. 7, 229--248. Google ScholarDigital Library
Lah, J. and Atkin, D. 1983. Tree compaction of micro-programs. In Proceedings of the 16th Annual Workshop on Micro-Programming. ACM, New York.Google Scholar
Langevin, M. and Cerny, E. 1996. A recursive technique for computing lower-bound performance of schedules. ACM Trans. Des. Autom. Electron. Syst. 1, 443--456. Google ScholarDigital Library
Linn, J. 1983. SRDAG Compaction—a generalization of trace scheduling to increase the use of global context information. In Proceedings of the 16th Annual Workshop on Micro-Programming. ACM, New York.Google Scholar
Mahlke, S., Lin, D., Chen, W., Hank, R., and Bringmann, R. 1992. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture. IEEE, Los Alamitos, CA. Google ScholarDigital Library
Muchnick, S. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Narasimhan, M. and Ramanujam, J. 2001. A fast approach to computing exact solutions to the resource-constrained scheduling problem. ACM Trans. Des. Autom. Electron. Syst. 6, 490--500. Google ScholarDigital Library
Rim, M. and Jain, R. 1994. Lower-bound performance estimation for the high-level synthesis scheduling problem. IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 13, 451--458.Google ScholarDigital Library
Shobaki, G. and Wilken, K. 2004. Optimal superblock scheduling using enumeraion. In Proceedings of the 37th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA. Google ScholarDigital Library
Shobaki, G. 2006. Optimal global instruction scheduling using enumeration. PhD dissertation, Dept. of Computer Science, University of California, Davis. Google ScholarDigital Library
Smith, M., Horowitz, M., and Lam, M. 1992. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York. Google ScholarDigital Library
Su, B., Ding, S., and Jin, L. 1984. An improvement of trace scheduling for global microcode compaction. In Proceedings of the 17th Annual Workshop on Microprogramming. ACM, New York. Google ScholarDigital Library
Sun Microsystems, 2004. UltraSPARC Processors Documentation, www.sun.com/processors/documentation.html.Google Scholar
Wilken, K., Liu, J., and Heffernan, M. 2000. Optimal instruction scheduling using integer programming. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York. Google ScholarDigital Library
Winkel S. 2007. Optimal versus heuristic global code scheduling. In Proceedings of the 40th International Symposium on Microarchitecture. IEEE, Los Alamitos, CA. Google ScholarDigital Library
Wolsey, L. 1998. Integer Programming. John Wiley and Sons, New York.Google Scholar

Index Terms

Optimal trace scheduling using enumeration

Recommendations

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Balancing Instruction-Level Parallelism (ILP) and register pressure during preallocation instruction scheduling is a fundamentally important problem in code generation and optimization. The problem is known to be NP-complete. Many heuristic techniques ...
Read More
Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling

Multiple combinatorial algorithms have been proposed for doing pre-allocation instruction scheduling with the objective of minimizing register pressure or balancing register pressure and instruction-level parallelism. The cost function that is minimized ...
Read More
Warp-aware trace scheduling for GPUs
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

GPU performance depends not only on thread/warp level parallelism (TLP) but also on instruction-level parallelism (ILP). It is not enough to schedule instructions within basic blocks, it is also necessary to exploit opportunities for ILP optimization ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 5, Issue 4
March 2009
122 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1498690
Issue’s Table of Contents

Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 March 2009
- Accepted: 1 July 2008
- Revised: 1 March 2008
- Received: 1 March 2006
Published in taco Volume 5, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Instruction scheduling
branch-and-bound enumeration
compiler optimizations
global instruction scheduling
instruction-level parallelism
optimal instruction scheduling
trace scheduling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 793
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimal trace scheduling using enumeration

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling

Warp-aware trace scheduling for GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimal trace scheduling using enumeration

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach

Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling

Warp-aware trace scheduling for GPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media