research-article

RaftLib: a C++ template library for high performance stream parallel processing

Authors:
Jonathan C. Beard

Washington University, St. Louis

Washington University, St. Louis
View Profile

,
Peng Li

Washington University, St. Louis

Washington University, St. Louis
View Profile

,
Roger D. Chamberlain

Washington University, St. Louis

Washington University, St. Louis
View Profile

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and ManycoresFebruary 2015Pages 96–105https://doi.org/10.1145/2712386.2712400

Published:07 February 2015Publication History

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores

Pages 96–105

ABSTRACT

Stream processing or data-flow programming is a compute paradigm that has been around for decades in many forms yet has failed garner the same attention as other mainstream languages and libraries (e.g., C++ or OpenMP [15]). Stream processing has great promise: the ability to safely exploit extreme levels of parallelism. There have been many implementations, both libraries and full languages. The full languages implicitly assume that the streaming paradigm cannot be fully exploited in legacy languages, while library approaches are often preferred for being integrable with the vast expanse of legacy code that exists in the wild. Libraries, however are often criticized for yielding to the shape of their respective languages. RaftLib aims to fully exploit the stream processing paradigm, enabling a full spectrum of streaming graph optimizations while providing a platform for the exploration of integrability with legacy C/C++ code. RaftLib is built as a C++ template library, enabling end users to utilize the robust C++ standard library along with RaftLib's pipeline parallel framework. RaftLib supports dynamic queue optimization, automatic parallelization, and real-time low overhead performance monitoring.

References

W. B. Ackerman. Data flow languages. Computer, 15(2):15--25, 1982. Google ScholarDigital Library
V. Adve, A. Carle, E. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, S. Warren, and C.-W. Tseng. Requirements for data-parallel programming environments. Technical report, DTIC Document, 1994.Google Scholar
K. Agrawal, J. Fineman, and J. Maglalang. Cache-conscious scheduling of streaming pipelines on parallel machines with private caches. In Proc. of IEEE Int'l Conf. on High Performance Computing, 2014.Google ScholarCross Ref
A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333--340, 1975. Google ScholarDigital Library
P. Arató, S. Juhász, Z. Á. Mann, A. Orbán, and D. Papp. Hardware-software partitioning in embedded system design. In IEEE International Symposium on Intelligent Signal Processing, pages 197--202. IEEE, 2003.Google ScholarCross Ref
D. C. Arnold, H. Casanova, and J. Dongarra. Innovations of the NetSolve grid computing system. Concurrency and Computation: Practice and Experience, 14(13-15):1457--1479, 2002.Google ScholarCross Ref
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Communications of the ACM, 52(10):56--67, 2009. Google ScholarDigital Library
J. C. Beard and R. D. Chamberlain. Analysis of a simple approach to modeling performance for streaming data applications. In Proc. of IEEE Int'l Symp. on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pages 345--349, Aug. 2013. Google ScholarDigital Library
J. C. Beard and R. D. Chamberlain. Use of a Levy distribution for modeling best case execution time variation. In A. Horváth and K. Wolter, editors, Computer Performance Engineering, volume 8721 of Lecture Notes in Computer Science, pages 74--88. Springer International, 2014.Google ScholarCross Ref
J. C. Beard, C. Epstein, and R. D. Chamberlain. Automated reliability classification of queueing models for streaming computation using support vector machines. In Proceedings of the 6th ACM/SPEC international conference on Performance engineering, ICPE '15, New York, NY, USA, Jan. 2015. ACM. to be published. Google ScholarDigital Library
J. Bosboom, S. Rajadurai, W.-F. Wong, and S. Amarasinghe. StreamJIT: A commensal compiler for high-performance stream programming. In Proc. of ACM International Conference on Object Oriented Programming Systems Languages & Applications, pages 177--195. ACM, 2014. Google ScholarDigital Library
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM Trans. on Graphics, 23(3):777--786, 2004. Google ScholarDigital Library
A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. In Proc. of 40th ACM Symposium on Theory of Computing, pages 641--650. ACM, 2008. Google ScholarDigital Library
R. D. Chamberlain, J. M. Lancaster, and R. K. Cytron. Visions for application development on hybrid computing systems. Parallel Comput., 34(4-5):201--216, May 2008. Google ScholarDigital Library
R. Chandra. Parallel Programming in OpenMP. Morgan Kaufmann, 2001. Google ScholarDigital Library
Working Draft, Standard for Programming Language C++. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf. Accessed Ocbober 2014.Google Scholar
G. De Michell and R. K. Gupta. Hardware/software co-design. Proceedings of the IEEE, 85(3):349--365, 1997.Google ScholarCross Ref
J. B. Dennis. First version of a data flow procedure language. In Programming Symposium, pages 362--376. Springer, 1974. Google ScholarCross Ref
J. B. Dennis. Data flow supercomputers. Computer, 13(11):48--56, 1980. Google ScholarDigital Library
H. Esmaeilzadeh, E. Blem, R. St Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In 38th International Symposium on Computer Architecture (ISCA), pages 365--376. IEEE, 2011. Google ScholarDigital Library
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the streaming model: The value of space. In Proc. of 16th ACM-SIAM Symposium on Discrete Algorithms, pages 745--754, Philadelphia, PA, USA, 2005. SIAM. Google ScholarDigital Library
M. Franklin, E. Tyson, J. Buckley, P. Crowley, and J. Maschmeyer. Auto-Pipe and the X language: A pipeline design tool and description language. In Proc. of Int'l Parallel and Distributed Processing Symp., Apr. 2006. Google ScholarDigital Library
I. Fumihiko, S. Nakagawa, and K. Hagihara. GPU-Chariot: A programming framework for stream applications running on multi-GPU systems. IEICE Transactions on Information and Systems, 96(12):2604--2616, 2013.Google Scholar
M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski. Stream-oriented FPGA computing in the Streams-C high level language. In Proc. of IEEE Symp. on Field-Programmable Custom Computing Machines, pages 49--56, Apr. 2000. Google ScholarDigital Library
L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, J. K. Hollingsworth, and M. V. Zelkowitz. Parallel programmer productivity: A case study of novice parallel programmers. In Proc. of ACM/IEEE Supercomputing Conference, pages 35--35. IEEE, 2005. Google ScholarDigital Library
A. Hormati, M. Kudlur, S. Mahlke, D. Bacon, and R. Rabbah. Optimus: efficient realization of streaming applications on FPGAs. In Proc. of Int'l Conf. on Compilers, Architectures and Synthesis for Embedded Systems, pages 41--50, 2008. Google ScholarDigital Library
R. N. Horspool. Practical fast searching in strings. Software: Practice and Experience, 10(6):501--506, 1980.Google ScholarCross Ref
K. Knobe and C. Offner. Compiling to tstreams, a new model of parallel computation. Technical report, Technical report, 2005.Google Scholar
J. M. Lancaster, E. F. B. Shands, J. D. Buhler, and R. D. Chamberlain. TimeTrial: A low-impact performance profiler for streaming data applications. In Proc. IEEE Int'l Conf. on Application-specific Systems, Architectures and Processors, Sept. 2011. Google ScholarDigital Library
J. M. Lancaster, J. G. Wingbermuehle, J. C. Beard, and R. D. Chamberlain. Crossing boundaries in TimeTrial: Monitoring communications across architecturally diverse computing platforms. In Proc. 9th IEEE/IFIP Int'l Conf. Embedded and Ubiquitous Computing, Oct. 2011. Google ScholarDigital Library
S. S. Lavenberg. A perspective on queueing models of computer performance. Performance Evaluation, 10(1):53--76, 1989. Google ScholarDigital Library
E. A. Lee and D. G. Messerschmitt. Synchronous data flow. Proc. IEEE, 75(9), 1987.Google ScholarCross Ref
C. E. Leiserson. The Cilk++ concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010. Google ScholarDigital Library
P. Li, K. Agrawal, J. Buhler, and R. D. Chamberlain. Deadlock avoidance for streaming computations with filtering. In ACM Symp. on Parallelism in Algorithms and Architectures, 2010. Google ScholarDigital Library
P. Li, K. Agrawal, J. Buhler, and R. D. Chamberlain. Adding data parallelism to streaming pipelines for throughput optimization. In Proc. of IEEE Int'l Conf. on High Performance Computing, 2013.Google ScholarCross Ref
J. R. McGraw. Data-flow computing: the VAL language. ACM Transactions on Programming Languages and Systems, 4(1):44--82, 1982. Google ScholarDigital Library
L. A. Meyerovich and A. S. Rabkin. Empirical analysis of programming language adoption. In Proc. of ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, pages 1--18. ACM, 2013. Google ScholarDigital Library
S. Padmanabhan, Y. Chen, and R. D. Chamberlain. Optimal design-space exploration of streaming applications. In Proc. IEEE Int'l Conf. Application-specific Systems, Architectures and Processors, Sept. 2011. Google ScholarDigital Library
S. Padmanabhan, Y. Chen, and R. D. Chamberlain. Convexity in non-convex optimizations of streaming applications. In Proc. of 18th IEEE Int'l Conf. on Parallel and Distributed Systems, pages 668--675, Dec. 2012. Google ScholarDigital Library
S. Padmanabhan, Y. Chen, and R. D. Chamberlain. Unchaining in design-space optimization of streaming applications. In Proc. of Workshop on Data-Flow Execution Models for Extreme Scale Computing, Sept. 2013. Google ScholarDigital Library
O. Pell and O. Mencer. Surviving the end of frequency scaling with reconfigurable dataflow computing. ACM SIGARCH Computer Architecture News, 39(4):60--65, 2011. Google ScholarDigital Library
RaftLib. http://www.raftlib.io. Accessed November 2014.Google Scholar
J. Reinders. Intel Threading Building Blocks: Outfitting C++ For Multi-core Processor Parallelism. O'Reilly Media, Inc., 2007. Google ScholarDigital Library
Samza. http://samza.incubator.apache.org. Accessed November 2014.Google Scholar
Stack Exchange Data Dump. https://archive.org/download/stackexchange/stackoverflow.com-PostHistory.7z. Accessed November 2014.Google Scholar
Storm: Distributed and fault-tolerant realtime computation. https://storm.apache.org. Accessed November 2014.Google Scholar
O. Tange. Gnu parallel - the command-line power tool. ;login: The USENIX Magazine, 36(1):42--47, Feb 2011.Google Scholar
W. Thies and S. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Proc. of 19th International Conference on Parallel Architectures and Compilation Techniques, pages 365--376. ACM, 2010. Google ScholarDigital Library
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A language for streaming applications. In R. Horspool, editor, Proc. of Int'l Conf. on Compiler Construction, volume 2304 of Lecture Notes in Computer Science, pages 49--84. 2002. Google ScholarDigital Library
TIOBE Programming Community index. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html. Accessed Ocbober 2014.Google Scholar
J. G. Wingbermuehle, R. D. Chamberlain, and R. K. Cytron. ScalaPipe: A streaming application generator. In Proc. Symp. on Application Accelerators in High-Performance Computing, July 2012. Google ScholarDigital Library
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: a high-performance Java dialect. Concurrency: Practice and Experience, 10(11-13):825--836, 1998.Google ScholarCross Ref

Index Terms

RaftLib: a C++ template library for high performance stream parallel processing

Recommendations

RaftLib

Stream processing is a compute paradigm that has been around for decades, yet until recently has failed to garner the same attention as other mainstream languages and libraries e.g. C++, OpenMP, MPI. Stream processing has great promise: the ability to ...
Read More
Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach
ICEBI '22: Proceedings of the 2022 6th International Conference on E-Business and Internet

Live streaming e-commerce has exploded recently. While the live streaming traffic is dominated by the top live streamers, merchants and ordinary live streamers attempt to establish self-operating live streaming, but the number of fans and sales ...
Read More
C++: The Design and Evolution of C++
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores
February 2015
186 pages
ISBN:9781450334044
DOI:10.1145/2712386
Editors:
Pavan Balaji
Argonne National Laboratory
,
Minyi Guo
Shanghai Jiao Tong University, China
,
Zhiyi Huang
University of Otago New Zealand
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 February 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
PMAM '15 Paper Acceptance Rate19of34submissions,56%Overall Acceptance Rate53of97submissions,55%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 177
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RaftLib: a C++ template library for high performance stream parallel processing

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores

ABSTRACT

References

Cited By

Index Terms

Recommendations

RaftLib

Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach

C++: The Design and Evolution of C++

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

RaftLib: a C++ template library for high performance stream parallel processing

PMAM '15: Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores

ABSTRACT

References

Cited By

Index Terms

Recommendations

RaftLib

Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach

C++: The Design and Evolution of C++

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media