skip to main content
10.1145/2435264.2435273acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Polyhedral-based data reuse optimization for configurable computing

Authors Info & Claims
Published:11 February 2013Publication History

ABSTRACT

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching.

We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

References

  1. Center for domain-specific computing. http://cdsc.ucla.edu.Google ScholarGoogle Scholar
  2. Convey. http://www.conveycomputer.com.Google ScholarGoogle Scholar
  3. http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm.Google ScholarGoogle Scholar
  4. Pocc 1.1. http://pocc.sourceforge.net.Google ScholarGoogle Scholar
  5. An independent evaluation of the autoesl autopilot high-level synthesis tool. Technical report, Berkeley Design Technology, Inc., 2010.Google ScholarGoogle Scholar
  6. N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for fpga. SIGPLAN Not., 47(8):285--286, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In ACM Symposium on Principles and practice of parallel programming, pages 1--10. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Bayliss and G. A. Constantinides. Optimizing sdram bandwidth for custom fpga loop accelerators. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, pages 195--204, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Brockmeyer, M. Miranda, and F. Catthoor. Layer assignment techniques for low energy in multi-layered memory organisations. In Design, Automation and Test in Europe Conference and Exhibition, 2003, pages 1070--1075, 2003. DATE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Catthoor, K. Danckaert, K. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T. v. Achteren, and T. Omnes. Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Catthoor, E. d. Greef, and S. Suytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Cong, K. Guruaj, M. Huang, S. Li, B. Xiao, and Y. Zou. Domain-specific processor with 3d integration for medical image processing. In IEEE Intl. Conf. on Application-Specific Systems, Architectures and Processors, pages 247--250, sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Cong, M. Huang, and Y. Zou. Accelerating fluid registration algorithm on multi-fpga platforms. In Proc. of Intl. Conf. on Field Programmable Logic and Applications (FPL'11). IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-level synthesis for fpgas: From prototyping to deployment. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4):473--491, april 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Cong, P. Zhang, and Y. Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Design Automation Conference (DAC'12), June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Darte, R. Schreiber, and G. Villard. Lattice-based memory allocation. IEEE Trans. Comput., 54(10):1242--1257, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler. Bridging the gap between compilation and synthesis in the defacto system. In LCPC'03, pages 52--70. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Feautrier. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program., 21(5):389--420, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Grosslinger. Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes. In Compiler Construction, pages 236--250, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.Google ScholarGoogle Scholar
  26. Q. Hu, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, and F. Catthoor. Incremental hierarchical memory size estimation for steering of loop transformations. ACM Trans. Des. Autom. Electron. Syst., 12, September 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. Drdu: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12, April 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Kandemir and A. Choudhary. Compiler-directed scratch pad memory hierarchy design and management. In Design Automation Conference, 2002. Proceedings. 39th, pages 628--633, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Q. Liu, G. A. Constantinides, K. Masselos, and P. Cheung. Combining data reuse with data-level parallelization for fpga-targeted hardware compilation: A geometric programming framework. Trans. Comp.-Aided Design of Integr. Circuits and Systems, 28(3):305--315, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Palkovic, F. Catthoor, and H. Corporaal. Trade-offs in loop transformations. ACM Trans. Des. Autom. Electron. Syst., 14:22:1--22:30, April 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. R. Panda, N. D. Dutt, and A. Nicolau. Local memory exploration and optimization in embedded systems. IEEE Trans. on CAD of Integrated Circuits and Systems, 18:3--13, January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. PolyOpt: A complete source-to-source Polyhedral Compiler, http://www.cse.ohio-state.edu/pouchet/polyopt.Google ScholarGoogle Scholar
  35. L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in fpga-based systems. In Programming Language Design and Implementation, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 327--337, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software - ICMS 2010, pages 299--302, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Wolf and M. Lam. A data locality optimizing algorithm. In ACM SIGPLAN'91 Conf. on Programming Language Design and Implementation, pages 30--44, New York, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Wolfe. Iteration space tiling for memory hierarchies. In 3rd SIAM Conf. on Parallel Processing for Scientific Computing, pages 357--361, Dec. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations. In Proc. of the ACM/SIGDA Intl. Symp. on Field Programmable Gate Arrays (FPGA'13), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Polyhedral-based data reuse optimization for configurable computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
          February 2013
          294 pages
          ISBN:9781450318877
          DOI:10.1145/2435264

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 February 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate125of627submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader