research-article

Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations

Authors:
Indu Bhagat

Universitat Politecnica de Catalunya, Barcelona, Spain

Universitat Politecnica de Catalunya, Barcelona, Spain
View Profile

,
Enric Gibert

Intel Labs-UPC, Barcelona, Spain

Intel Labs-UPC, Barcelona, Spain
View Profile

,
Jesús Sánchez

Intel Labs-UPC, Barcelona, Spain

Intel Labs-UPC, Barcelona, Spain
View Profile

,
Antonio González

Intel Labs-UPC, Barcelona, Spain

Intel Labs-UPC, Barcelona, Spain
View Profile

LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsApril 2011Pages 161–170https://doi.org/10.1145/1967677.1967700

Published:11 April 2011Publication History

LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Pages 161–170

ABSTRACT

This paper proposes a unique hardware-software collaborative strategy to remove useless work at 16-bit data-width granularity. The underlying motivation is to design a low power execution platform by exploiting 'narrow' computations. The proposal uses a strictly narrow bit-wide microarchitecture (16-bit integer datapath), which realizes the goal of a low cost, low hardware complexity, low power execution engine. Software dynamically maps the 64-bit computations by translating them into an equivalent 16-bit instruction stream and optimizing them.

In this paper, we propose an optimization technique, called Global Productiveness Propagation (GPP), which is a dynamic, profile-based optimization technique that infers the minimum required dataflow by pruning narrow computations that are most-probably useless (non-productive). More precisely, GPP speculatively prunes the static backward slices of selected narrow computations: computations that result in the same value (in their respective storage location) as that at the input of the region. This speculative optimization technique is formulated around the concept of 'narrow' computations because the same allow a finer granularity to distinguish between useful (productive) and useless (non-productive) work. GPP has been evaluated on an in-order narrow bit-wide execution core, achieving an average dynamic instruction stream reduction of 6.6%, while improving overall performance by 4.2%.

References

Brooks et al., Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance, Proceedings of the 5th International Symposium on High Performance Computer Architecture, p.13, January 09--12, 199. Google ScholarDigital Library
Brooks et al., Wattch: a framework for architectural-level power analysis and optimizations, Proceedings of the 27th annual international symposium on Computer architecture, 200. Google ScholarDigital Library
Budiu et al., BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations, Proceedings from the 6th International Euro-Par Conference on Parallel Processing, p.969--979, August 29-September 01, 200. Google ScholarDigital Library
Calder et al, Value profiling, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, December 01-03, 199. Google ScholarDigital Library
Canal et al., Very low power pipelines using significance compression, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.181--190, December 200. Google ScholarDigital Library
Canal et al., Software-Controlled Operand-Gating, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.125, March 20--24, 200. Google ScholarDigital Library
Ceze et al., Bulk Disambiguation of Speculative Threads in Multiprocessors. In Procs. of the 33th Intl Symp on Compute. Architecture, June 2006. Google ScholarDigital Library
Cheng et al., Self-timed carry-lookahead adders, IEEE Transactions on Computers, 200. Google ScholarDigital Library
Cintra et al., Architectural Support for Scalable Speculative Parallelization in Shared-Memory Systems, 27th Annual International Symposium on Computer Architecture (ISCA), June 2000. Google ScholarDigital Library
Ergin et al., Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure, International Symposium on Microarchitecture, December 200. Google ScholarDigital Library
Ferrante et al., The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems (TOPLAS), 198. Google ScholarDigital Library
Kim et al., Leakage Current: Moore's Law Meets Static Power, IEEE Computer, 200. Google ScholarDigital Library
Kondo et al., A Small, Fast and Low-Power Register File by Bit-Partitioning, Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005. Google ScholarDigital Library
Li et al., Bit section instruction set extension of ARM for embedded applications, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08--11, 200. Google ScholarDigital Library
Loh et al., Exploiting data-width locality to increase superscalar execution bandwidth, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18--22, 200. Google ScholarDigital Library
Mahlke et al., Bitwidth cognizant architecture synthesis of custom hardwareaccelerators, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, November 200. Google ScholarDigital Library
Moore et al., LogTM: Log-based TransactionalMemory. In Procs. of the 12th Intl Symp on High-Performance Computer Architecture, Feb. 2006.Google Scholar
Muralimanohar et al., Cacti 6.5, HP Laboratories Palo Alto, 200.Google Scholar
Pokam et al., Speculative software management of datapath-width for energy optimization, Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 11--13, 200. Google ScholarDigital Library
Shriraman et al., Flexible Decoupled Transactional Memory Support. In Procs. of the 35th Intl Symp on Computer Architecture, June 2008. Google ScholarDigital Library
Steffan et al., A Scalable Approach to Thread-Level Speculation, International Symposium on Computer Architecture, Vancouver, BC, June, 2000. Google ScholarDigital Library
Stephenson et al., Bidwidth analysis with application to silicon compilation, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, 200. Google ScholarDigital Library
Segars et al., Low power design techniques for microprocessors, International Solid-State Circuits Conference Tutorial, 200.Google Scholar
Tallam et al., Bitwidth aware global register allocation, ACM SIGPLAN Notices, v.38 n.1, p.85--96, January 200. Google ScholarDigital Library
Vijaykrishnan et al., Energy-driven integrated hardware-software optimizations using SimplePower, Proceedings of the 27th annual international symposium on Computer architecture, p.95--106, June 200. Google ScholarDigital Library
Wang et al., Intel&#174; atom" processor core made FPGA-synthesizable, Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays, 200. Google ScholarDigital Library
Yang et al., An Embedded Low Power/Cost 16-Bit Data/Instruction Microprocessor Compatible with ARM7 Software Tools, Proceedings of the 2007 conference on Asia South Pacific design automation, p.902, 23--26 Jan. 200.Google Scholar
M. T. Yourst, PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator, IEEE International Symposium on Performance Analysis of Systems &#38; Software, 200.Google Scholar
Submission in progress. Omitted for the sake of anonymity.Google Scholar

Index Terms

Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations
LCTES '10

This paper proposes a unique hardware-software collaborative strategy to remove useless work at 16-bit data-width granularity. The underlying motivation is to design a low power execution platform by exploiting 'narrow' computations. The proposal uses a ...
Read More
Post-compiler software optimization for reducing energy
ASPLOS '14

Modern compilers typically optimize for executable size and speed, rarely exploring non-functional properties such as power efficiency. These properties are often hardware-specific, time-intensive to optimize, and may not be amenable to standard ...
Read More
Post-compiler software optimization for reducing energy
ASPLOS '14

Modern compilers typically optimize for executable size and speed, rarely exploring non-functional properties such as power efficiency. These properties are often hardware-specific, time-intensive to optimize, and may not be amenable to standard ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
April 2011
182 pages
ISBN:9781450305556
DOI:10.1145/1967677
General Chair:
Jan Vitek
Purdue University, USA
,
Program Chair:
Bjorn De Sutter
Ghent University, Belgium
ACM SIGPLAN Notices Volume 46, Issue 5
LCTES '10
May 2011
170 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2016603
Issue’s Table of Contents
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 April 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
narrow bitwide computation
profile-guided optimization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate116of438submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 184
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations

LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations

Post-compiler software optimization for reducing energy

Post-compiler software optimization for reducing energy