skip to main content
10.5555/1326073.1326165acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Operation chaining asynchronous pipelined circuits

Published: 05 November 2007 Publication History

Abstract

We define operation chaining (op-chaining) as an optimization problem to determine the optimal pipeline depth for balancing performance against energy demands in pipelined asynchronous designs. Since there are no clock period requirements, asynchronous pipeline stages can have non-uniform latencies. We exploit this fact to coalesce several stages together thereby saving power and area due to the elimination of control-path resources from the pipeline. The trade-off is potentially reduced pipeline parallelism.
In this paper, we formally define this optimization as a graph covering problem, which finds sub-graphs that will be synthesized as an opchained pipeline stage. We then define the solution space for provably correct solutions and present an algorithm to efficiently search this space. The search technique partitions the graph based on post-dominator relationships to find sub-graphs that are potential op-chain candidates. We use knowledge of the Global Critical Path (GCP) [13] to evaluate the performance impact of accepting a candidate sub-graph and formulate a heuristic cost function to model this trade-off. The algorithm has a quadratic-time complexity in the size of the dataflow graph. We have implemented this algorithm within an automated asynchronous synthesis toolchain [12]. Experimental evidence from applying the algorithm on several media processing kernels reveals that the average energy-delay and energy-delay-area products improve by about 1.4x and 1.8x respectively, with a maximum improvement of 5x and 18x.

References

[1]
M. Budiu, G. Venkataramani, et al. Spatial computation. In ASPLOS, pp. 14--26, October 2004.
[2]
E.-S. Chang, D. Gajski, et al. An optimal clock period selection method based on slack minimization criteria. TODAES, 1(3):352--370, 1996.
[3]
T. Chelcea and S. M. Nowick. Resynthesis and peephole transformations for the optimization of large-scale asynchronous systems. In DAC, pp. 405--410, 2002.
[4]
S. Furber and P. Day. Four-phase micropipeline latch control circuits. IEEE Transactions on Very Large Scale Integration Systems, 4-2:247--253, 1996.
[5]
G. Venkataramani and S. C. Goldstein. Leveraging protocol knowledge in slack matching. In ICCAD, pp. 724--729, 2006.
[6]
S. Kim and P. A. Beerel. Pipeline optimization for asynchronous circuits: Complexity analysis, and an efficient optimal algorithm. In ICCAD, pp. 296--302, 2000.
[7]
C. Lee, M. Potkonjak, et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In MICRO, pp. 330--335, 1997.
[8]
N. Maheshwari and S. S. Sapatnekar. Minimum area retiming with equivalent initial states. In ICCAD, pp. 216--219, 1997.
[9]
S. S. Muchnick. Advanced compiler design and implementation. 1997.
[10]
C. D. Nielsen and M. Kishinevsky. Performance analysis based on timing simulation. In DAC, pp. 70--76, 1994.
[11]
R. K. Ranjan, V. Singhal, et al. On the optimization power of retiming and resynthesis transformations. In ICCAD, pp. 402--407, 1998.
[12]
G. Venkataramani, M. Budiu, et al. C to asynchronous dataflow circuits: An end-to-end toolflow. In IWLS, June 2004.
[13]
G. Venkataramani, M. Budiu, et al. Global Critical Path: A Tool for System-Level Timing Analysis. In DAC, June 2007.
[14]
T. E. Williams. Self-timed rings and their application to division. PhD thesis, Stanford University, 1991.
[15]
A. Xie, S. Kim, et al. Bounding average time separations of events in stochastic timed petri nets with choice. In ASYNC, p. 94, 1999.
[16]
Y. Yi, I. Nousias, et al. System-level scheduling on instruction cell based reconfigurable systems. In DATE, pp. 381--386, 2006.

Cited By

View all
  • (2011)Performance-driven clustering of asynchronous circuitsProceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation10.5555/2045364.2045374(92-101)Online publication date: 26-Sep-2011
  • (2008)Slack analysis in the system design loopProceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis10.1145/1450135.1450189(231-236)Online publication date: 19-Oct-2008

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '07: Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
November 2007
933 pages
ISBN:1424413826
  • General Chair:
  • Georges Gielen

Sponsors

Publisher

IEEE Press

Publication History

Published: 05 November 2007

Check for updates

Qualifiers

  • Research-article

Conference

ICCAD07
Sponsor:

Acceptance Rates

ICCAD '07 Paper Acceptance Rate 139 of 510 submissions, 27%;
Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Performance-driven clustering of asynchronous circuitsProceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation10.5555/2045364.2045374(92-101)Online publication date: 26-Sep-2011
  • (2008)Slack analysis in the system design loopProceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis10.1145/1450135.1450189(231-236)Online publication date: 19-Oct-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media