skip to main content
10.1145/1233501.1233559acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
Article

Loop pipelining for high-throughput stream computation using self-timed rings

Published: 05 November 2006 Publication History

Abstract

We present a technique for increasing the throughput of stream processing architectures by removing the bottlenecks caused by loop structures. We implement loops as self-timed pipelined rings that can operate on multiple data sets concurrently. Our contribution includes a transformation algorithm which takes as input a high-level program and gives as output the structure of an optimized pipeline ring. Our technique handles nested loops and is further enhanced by loop unrolling. Simulations run on benchmark examples show a 1.3 to 4.9x speedup without unrolling and a 2.6 to 9.7x speedup with twofold loop unrolling.

References

[1]
Handshake Solutions, a Philips subsidiary. http://www.handshakesolutions.com/.
[2]
A. Aiken and A. Nicolau. Perfect pipelining: A new loop parallelization technique. In European Symposium on Programming, pages 221--235, 1988.
[3]
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM Comput. Surv., 27(3):367--432, 1995.
[4]
M. Budiu. Spatial Computation. PhD thesis, Carnegie Mellon University, Computer Science Department, December 2003. Technical report CMU-CS-03-217.
[5]
A. Davis and S. M. Nowick. An introduction to asynchronous circuit design. Technical Report UUCS-97-013, Dept. of Computer Science, University of Utah, Sept. 1997.
[6]
J. L. Hennessy and D. A. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
[7]
U. J. Kapasi, W. J. Dally, S. Rixner, P. R. Mattson, J. D. Owens, and B. Khailany. Efficient conditional operations for data-parallel architectures. In Proc. of IEEE/ACM Intl. Symp. on Microarchitecture, 2000.
[8]
K. Kennedy and J. R. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002.
[9]
M. Theobald and S. M. Nowick. Transformations for the synthesis and optimization of asynchronous distributed control. In Proc. ACM/IEEE Design Automation Conf., June 2001.
[10]
T. E. Williams. Self-Timed Rings and their Application to Division. PhD thesis, Stanford University, June 1991.
[11]
T. E. Williams and M. A. Horowitz. A zero-overhead self-timed 160ns 54b CMOS divider. IEEE Journal of Solid-State Circuits, 26(11):1651--1661, Nov. 1991.
[12]
K. Y. Yun, P. A. Beerel, V. Vakilotojar, A. E. Dooply, and J. Arceo. The design and verification of a high-performance low-control-overhead asynchronous differential equation solver. In Proc. Int. Symp. on Advanced Research in Asynchronous Circuits and Systems, pages 140--153. IEEE Computer Society Press, Apr. 1997.
[13]
K. Y. Yun, P. A. Beerel, V. Vakilotojar, A. E. Dooply, and J. Arceo. The design and verification of a high-performance low-control-overhead asynchronous differential equation solver. IEEE Trans. on VLSI Systems, 6(4):643--655, Dec. 1998.

Cited By

View all
  • (2014)On self-timed ring for consistent mapping and maximum throughput2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910511(1-9)Online publication date: Aug-2014
  • (2013)Resistive Computing: Memristors-Enabled Signal MultiplicationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2013.224443460:5(1241-1249)Online publication date: May-2013
  • (2011)Field Programmable Stateful Logic ArrayIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.216506730:12(1800-1813)Online publication date: 1-Dec-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '06: Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
November 2006
147 pages
ISBN:1595933891
DOI:10.1145/1233501
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICCAD06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)On self-timed ring for consistent mapping and maximum throughput2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910511(1-9)Online publication date: Aug-2014
  • (2013)Resistive Computing: Memristors-Enabled Signal MultiplicationIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2013.224443460:5(1241-1249)Online publication date: May-2013
  • (2011)Field Programmable Stateful Logic ArrayIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.216506730:12(1800-1813)Online publication date: 1-Dec-2011
  • (2008)Performance estimation and slack matching for pipelined asynchronous architectures with choiceProceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design10.5555/1509456.1509560(449-456)Online publication date: 10-Nov-2008
  • (2008)Concurrency-Enhancing Transformations for Asynchronous Behavioral SpecificationsProceedings of the 2008 14th IEEE International Symposium on Asynchronous Circuits and Systems10.1109/ASYNC.2008.20(15-25)Online publication date: 7-Apr-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media