skip to main content
10.1145/2712386.2712400acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

RaftLib: a C++ template library for high performance stream parallel processing

Published:07 February 2015Publication History

ABSTRACT

Stream processing or data-flow programming is a compute paradigm that has been around for decades in many forms yet has failed garner the same attention as other mainstream languages and libraries (e.g., C++ or OpenMP [15]). Stream processing has great promise: the ability to safely exploit extreme levels of parallelism. There have been many implementations, both libraries and full languages. The full languages implicitly assume that the streaming paradigm cannot be fully exploited in legacy languages, while library approaches are often preferred for being integrable with the vast expanse of legacy code that exists in the wild. Libraries, however are often criticized for yielding to the shape of their respective languages. RaftLib aims to fully exploit the stream processing paradigm, enabling a full spectrum of streaming graph optimizations while providing a platform for the exploration of integrability with legacy C/C++ code. RaftLib is built as a C++ template library, enabling end users to utilize the robust C++ standard library along with RaftLib's pipeline parallel framework. RaftLib supports dynamic queue optimization, automatic parallelization, and real-time low overhead performance monitoring.

References

  1. W. B. Ackerman. Data flow languages. Computer, 15(2):15--25, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Adve, A. Carle, E. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, S. Warren, and C.-W. Tseng. Requirements for data-parallel programming environments. Technical report, DTIC Document, 1994.Google ScholarGoogle Scholar
  3. K. Agrawal, J. Fineman, and J. Maglalang. Cache-conscious scheduling of streaming pipelines on parallel machines with private caches. In Proc. of IEEE Int'l Conf. on High Performance Computing, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333--340, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Arató, S. Juhász, Z. Á. Mann, A. Orbán, and D. Papp. Hardware-software partitioning in embedded system design. In IEEE International Symposium on Intelligent Signal Processing, pages 197--202. IEEE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. C. Arnold, H. Casanova, and J. Dongarra. Innovations of the NetSolve grid computing system. Concurrency and Computation: Practice and Experience, 14(13-15):1457--1479, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Communications of the ACM, 52(10):56--67, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. C. Beard and R. D. Chamberlain. Analysis of a simple approach to modeling performance for streaming data applications. In Proc. of IEEE Int'l Symp. on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pages 345--349, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. C. Beard and R. D. Chamberlain. Use of a Levy distribution for modeling best case execution time variation. In A. Horváth and K. Wolter, editors, Computer Performance Engineering, volume 8721 of Lecture Notes in Computer Science, pages 74--88. Springer International, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. C. Beard, C. Epstein, and R. D. Chamberlain. Automated reliability classification of queueing models for streaming computation using support vector machines. In Proceedings of the 6th ACM/SPEC international conference on Performance engineering, ICPE '15, New York, NY, USA, Jan. 2015. ACM. to be published. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Bosboom, S. Rajadurai, W.-F. Wong, and S. Amarasinghe. StreamJIT: A commensal compiler for high-performance stream programming. In Proc. of ACM International Conference on Object Oriented Programming Systems Languages & Applications, pages 177--195. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. on Graphics, 23(3):777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. In Proc. of 40th ACM Symposium on Theory of Computing, pages 641--650. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. D. Chamberlain, J. M. Lancaster, and R. K. Cytron. Visions for application development on hybrid computing systems. Parallel Comput., 34(4-5):201--216, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Chandra. Parallel Programming in OpenMP. Morgan Kaufmann, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Working Draft, Standard for Programming Language C++. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf. Accessed Ocbober 2014.Google ScholarGoogle Scholar
  17. G. De Michell and R. K. Gupta. Hardware/software co-design. Proceedings of the IEEE, 85(3):349--365, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, pages 362--376. Springer, 1974. Google ScholarGoogle ScholarCross RefCross Ref
  19. J. B. Dennis. Data flow supercomputers. Computer, 13(11):48--56, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Esmaeilzadeh, E. Blem, R. St Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In 38th International Symposium on Computer Architecture (ISCA), pages 365--376. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the streaming model: The value of space. In Proc. of 16th ACM-SIAM Symposium on Discrete Algorithms, pages 745--754, Philadelphia, PA, USA, 2005. SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Franklin, E. Tyson, J. Buckley, P. Crowley, and J. Maschmeyer. Auto-Pipe and the X language: A pipeline design tool and description language. In Proc. of Int'l Parallel and Distributed Processing Symp., Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Fumihiko, S. Nakagawa, and K. Hagihara. GPU-Chariot: A programming framework for stream applications running on multi-GPU systems. IEICE Transactions on Information and Systems, 96(12):2604--2616, 2013.Google ScholarGoogle Scholar
  24. M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In Proc. of IEEE Symp. on Field-Programmable Custom Computing Machines, pages 49--56, Apr. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, J. K. Hollingsworth, and M. V. Zelkowitz. Parallel programmer productivity: A case study of novice parallel programmers. In Proc. of ACM/IEEE Supercomputing Conference, pages 35--35. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Hormati, M. Kudlur, S. Mahlke, D. Bacon, and R. Rabbah. Optimus: efficient realization of streaming applications on FPGAs. In Proc. of Int'l Conf. on Compilers, Architectures and Synthesis for Embedded Systems, pages 41--50, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. N. Horspool. Practical fast searching in strings. Software: Practice and Experience, 10(6):501--506, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  28. K. Knobe and C. Offner. Compiling to tstreams, a new model of parallel computation. Technical report, Technical report, 2005.Google ScholarGoogle Scholar
  29. J. M. Lancaster, E. F. B. Shands, J. D. Buhler, and R. D. Chamberlain. TimeTrial: A low-impact performance profiler for streaming data applications. In Proc. IEEE Int'l Conf. on Application-specific Systems, Architectures and Processors, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. M. Lancaster, J. G. Wingbermuehle, J. C. Beard, and R. D. Chamberlain. Crossing boundaries in TimeTrial: Monitoring communications across architecturally diverse computing platforms. In Proc. 9th IEEE/IFIP Int'l Conf. Embedded and Ubiquitous Computing, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. S. Lavenberg. A perspective on queueing models of computer performance. Performance Evaluation, 10(1):53--76, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proc. IEEE, 75(9), 1987.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. E. Leiserson. The Cilk++ concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Li, K. Agrawal, J. Buhler, and R. D. Chamberlain. Deadlock avoidance for streaming computations with filtering. In ACM Symp. on Parallelism in Algorithms and Architectures, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Li, K. Agrawal, J. Buhler, and R. D. Chamberlain. Adding data parallelism to streaming pipelines for throughput optimization. In Proc. of IEEE Int'l Conf. on High Performance Computing, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  36. J. R. McGraw. Data-flow computing: the VAL language. ACM Transactions on Programming Languages and Systems, 4(1):44--82, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. A. Meyerovich and A. S. Rabkin. Empirical analysis of programming language adoption. In Proc. of ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pages 1--18. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Padmanabhan, Y. Chen, and R. D. Chamberlain. Optimal design-space exploration of streaming applications. In Proc. IEEE Int'l Conf. Application-specific Systems, Architectures and Processors, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Padmanabhan, Y. Chen, and R. D. Chamberlain. Convexity in non-convex optimizations of streaming applications. In Proc. of 18th IEEE Int'l Conf. on Parallel and Distributed Systems, pages 668--675, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Padmanabhan, Y. Chen, and R. D. Chamberlain. Unchaining in design-space optimization of streaming applications. In Proc. of Workshop on Data-Flow Execution Models for Extreme Scale Computing, Sept. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. O. Pell and O. Mencer. Surviving the end of frequency scaling with reconfigurable dataflow computing. ACM SIGARCH Computer Architecture News, 39(4):60--65, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. RaftLib. http://www.raftlib.io. Accessed November 2014.Google ScholarGoogle Scholar
  43. J. Reinders. Intel Threading Building Blocks: Outfitting C++ For Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Samza. http://samza.incubator.apache.org. Accessed November 2014.Google ScholarGoogle Scholar
  45. Stack Exchange Data Dump. https://archive.org/download/stackexchange/stackoverflow.com-PostHistory.7z. Accessed November 2014.Google ScholarGoogle Scholar
  46. Storm: Distributed and fault-tolerant realtime computation. https://storm.apache.org. Accessed November 2014.Google ScholarGoogle Scholar
  47. O. Tange. Gnu parallel - the command-line power tool. ;login: The USENIX Magazine, 36(1):42--47, Feb 2011.Google ScholarGoogle Scholar
  48. W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Proc. of 19th International Conference on Parallel Architectures and Compilation Techniques, pages 365--376. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In R. Horspool, editor, Proc. of Int'l Conf. on Compiler Construction, volume 2304 of Lecture Notes in Computer Science, pages 49--84. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. TIOBE Programming Community index. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html. Accessed Ocbober 2014.Google ScholarGoogle Scholar
  51. J. G. Wingbermuehle, R. D. Chamberlain, and R. K. Cytron. ScalaPipe: A streaming application generator. In Proc. Symp. on Application Accelerators in High-Performance Computing, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: a high-performance Java dialect. Concurrency: Practice and Experience, 10(11-13):825--836, 1998.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. RaftLib: a C++ template library for high performance stream parallel processing

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores
                  February 2015
                  186 pages
                  ISBN:9781450334044
                  DOI:10.1145/2712386

                  Copyright © 2015 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 7 February 2015

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  PMAM '15 Paper Acceptance Rate19of34submissions,56%Overall Acceptance Rate53of97submissions,55%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader