skip to main content
10.1145/1454115.1454121acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Exploiting loop-dependent stream reuse for stream processors

Published: 25 October 2008 Publication History

Abstract

The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start and end are known. Compiler analysis from outside of stream processors does not directly enable the consideration of other complex stream references. In this paper we propose a transformation to automatically optimize stream programs to exploit the reuse supplied by loop-dependent stream references. The transformation is based on three results: algorithms to recognize the reuse supplied by stream references, a new abstract expression called the Stream Reuse Graph (SRG) to depict the reuse and the optimization of the SRG for the transformation. Both the reuse between whole sequences accessed by stream references and that between partial sequences are exploited in the paper. In particular, the problem of exploiting partial stream reuse does not have its parallel in the traditional data reuse exploitation setting (for scalars and arrays). Finally, we have implemented our techniques using the StreamC/KernelC compiler for Imagine. Experimental results show a resultant speedup of 1.14 to 2.54 times using a range of typical stream processing application kernels.

References

[1]
J. H. Ahn, M. Erez, and W. J. Dally. The design space of data-parallel memory systems. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 80, New York, NY, USA, 2006. ACM Press.
[2]
J. R. Allen. Dependence analysis for subscripted variables and its application to program transformations. PhD thesis, Houston, TX, USA, 1983.
[3]
R. Allen and K. Kennedy. Vector register allocation. IEEE Trans. Comput., 41(10):1290--1317, 1992.
[4]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004.
[5]
D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for subscripted variables. In PLDI'90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation, pages 53--65, New York, NY, USA, 1990. ACM.
[6]
S. Carr and K. Kennedy. Scalar replacement in the presence of conditional control flow. Softw. Pract. Exper., 24(1):51--77, 1994.
[7]
T. F. Chan, E. Gallopoulos, V. Simoncini, T. Szeto, and C. H. Tong. A quasi-minimal residual variant of the bi-cgstab algorithm for nonsymmetric systems. SIAM J. Sci. Comput., 15(2):338--347, 1994.
[8]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K Zadeck. Efficiently computing static single assignment form and the control dependence graph. Technical report, Providence, RI, USA, 1991.
[9]
W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J. H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. K., and U. J. Kapasi. Merrimac: Supercomputing with streams. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 35, Washington, DC, USA, 2003. IEEE Computer Society.
[10]
A. Das, W. J. Dally, and P. Mattson. Compiling for stream processing. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 33--42, New York, NY, USA, 2006. ACM Press.
[11]
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. J. Parallel Distrib. Comput., 5(5):587--616, 1988.
[12]
N. Jayasena, M. Erez, J. H. Ahn, and W. J. Dally. Stream register files with indexed access. In HPCA '04: Proceedings of the 10th International Symposium on High Performance Computer Architecture, page 60, Washington, DC, USA, 2004. IEEE Computer Society.
[13]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the cell multiprocessor. IBM J. Res. Dev., 49(4/5):589--604, 2005.
[14]
U. J. Kapasi, W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany. The imagine stream processor. In ICCD, pages 282--288, 2002.
[15]
U. J. Kapasi, S. Rixner, W. J. Dally, B. Khailany, J. H. Ahn, P. Mattson, and J. D. Owens. Programmable stream processors. Computer, 36(8):54--62, 2003.
[16]
S. W. Liao, Z. H. Du, G. S. Wu, and G. Y. Lueh. Data and computation transformations for brook streaming applications on multiprocessors. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization, pages 196--207, Washington, DC, USA, 2006. IEEE Computer Society.
[17]
R. Lo, F. Chow, R. Kennedy, S. Liu, and P. Tu. Register promotion by sparse partial redundancy elimination of loads and stores. In PLDI '98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pages 26--37, New York, NY, USA, 1998. ACM.
[18]
J. Lu and K. D. Cooper. Register promotion in c programs. In PLDI '97: Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, pages 308--319, New York, NY, USA, 1997. ACM.
[19]
P. Mattson. A programming system for the imagine media processor. PhD thesis, Computer Systems Laboratory, Stanford University, 2002. Adviser-William J. Dally.
[20]
P. Mattson and et al. Imagine Programming System Developer's Guide, 2004.
[21]
M. Narayanan, L. Oliker, A. Janin, P. Husbands, X Ye, and S. Li. Scientific kernels on viram and imagine media processors. Lawrence Berkeley National Laboratory, 2002.
[22]
J. D. Owens, U. J. Kapasi, P. Mattson, B. Towles, B. Serebrin, S. Rixner, and W. J. Dally. Media processing applications on the Imagine stream processor. In Proceedings of the IEEE International Conference on Computer Design, pages 295--302, September 2002.
[23]
S. Rixner. Stream Processor Architecture. Kluwer Academic Publishers, 2002.
[24]
A. V. S. Sastry and R. D. C. Ju. A new algorithm for scalar register promotion based on ssa form. In PLDI '98: Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, pages 15--25, New York, NY, USA, 1998. ACM.
[25]
R. Stephens. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.
[26]
M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, Fae Ghodrat, Ben Greenwald, Henry Hoffman, Paul Johnson, Jae-Wook Lee, Walter Lee, Albert Ma, Arvind Saraf, Mark Seneski, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, and Anant Agarwal. The raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25--35, 2002.
[27]
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In PLDI '91: Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, pages 30--44, New York, NY, USA, 1991. ACM.
[28]
W. A. Wulf and S. A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20--24, 1995.
[29]
J. Xue and C.-H. Huang. Reuse-driven tiling for data locality. In 10th Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science 1366, pages 16--33, Minneapolis, Minn., 1997. Springer-Verlag.
[30]
X. J. Yang, X. B. Yan, Z. C. Xing, Y. Deng, J. Jiang, and Y. Zhang. A 64-bit stream processor architecture for scientific applications. In ISCA '07, volume 35, pages 210--219, New York, NY, USA, 2007. ACM Press.

Cited By

View all
  • (2012)Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream ProcessorsACM Transactions on Architecture and Code Optimization10.1145/2133382.21333879:1(1-30)Online publication date: 1-Mar-2012
  • (2011)Parallel bubble sort using stream programming paradigm2011 5th International Conference on Application of Information and Communication Technologies (AICT)10.1109/ICAICT.2011.6111024(1-5)Online publication date: Oct-2011
  • (2010)Reuse-aware modulo scheduling for stream processorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871197(1112-1117)Online publication date: 8-Mar-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
October 2008
328 pages
ISBN:9781605582825
DOI:10.1145/1454115
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. StreamC
  2. stream professor
  3. stream programming model
  4. stream register file
  5. stream reuse

Qualifiers

  • Research-article

Conference

PACT '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream ProcessorsACM Transactions on Architecture and Code Optimization10.1145/2133382.21333879:1(1-30)Online publication date: 1-Mar-2012
  • (2011)Parallel bubble sort using stream programming paradigm2011 5th International Conference on Application of Information and Communication Technologies (AICT)10.1109/ICAICT.2011.6111024(1-5)Online publication date: Oct-2011
  • (2010)Reuse-aware modulo scheduling for stream processorsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871197(1112-1117)Online publication date: 8-Mar-2010
  • (2010)Reuse-aware modulo scheduling for stream processors2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)10.1109/DATE.2010.5456975(1112-1117)Online publication date: Mar-2010
  • (2009)Comparability graph coloring for optimizing utilization of stream register files in stream processorsACM SIGPLAN Notices10.1145/1594835.150419544:4(111-120)Online publication date: 14-Feb-2009
  • (2009)Comparability graph coloring for optimizing utilization of stream register files in stream processorsProceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/1504176.1504195(111-120)Online publication date: 14-Feb-2009
  • (2008)Exploiting the reuse supplied by loop-dependent stream references for stream processorsACM Transactions on Architecture and Code Optimization (TACO)10.1145/1839667.18396737:2(1-35)Online publication date: 5-Oct-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media