research-article

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Authors:
Jongsoo Park

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
William J. Dally

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architecturesJune 2010Pages 1–10https://doi.org/10.1145/1810479.1810481

Published:13 June 2010Publication History

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

Pages 1–10

ABSTRACT

We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling. Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over buffer space, 4) deadlock-free feedback loops, and 5)lower latency. We compare team scheduling to the latest stream scheduling algorithm, sgms, by evaluating 14 applications on a multi-core architecture with 16 cores. Team scheduling successfully targets applications that cannot be validly scheduled by sgms due to excessive buffer requirement or deadlocks in feedback loops (e.g., gsm and w-cdma). For applications that can be validly scheduled by sgms, team scheduling shows on average 37% higher throughput within the same buffer space constraints.

References

J. Balfour, W. J. Dally, D. Black-Schaffer, V. Parikh, and J. Park. An Energy-Efficient Processor Architecture for Embedded Systems. Computer Architecture Letters, 7(1):29--32, 2008. Google ScholarDigital Library
R. Bellman. On a Routing Problem. Quarterly of Applied Mathematics, 16(1):87--90, 1958.Google ScholarCross Ref
S. S. Bhattacharyya, J. T. Buck, S. Ha, and E. A. Lee. Generating Compact Code from Dataflow Specifications of Multirate Signal Processing Algorithms. IEEE Transactions on Circuits and Systems, 42(3):138--150, 1995.Google ScholarCross Ref
S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations. Design Automation for Embedded Systems, 2(1):33--60, 1997.Google ScholarDigital Library
S. S. Bhattacharyya, S. Sriram, and E. A. Lee. Optimizing Synchronization in Multiprocessor DSP Systems. IEEE Transactions on Signal Processing, 45(6):1605--1618, 1997. Google ScholarDigital Library
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Transactions on Programming Language and Systems (TOPLAS), 13(4):451--490, 1991. Google ScholarDigital Library
W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004. Google ScholarDigital Library
ELM Webpage. Concurrent VLSI Architecture Group, Stanford University. http://cva.stanford.edu/projects/elm/software.htm.Google Scholar
K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Conference on Supercomputing (SC), 2006. Google ScholarDigital Library
M. I. Gordon, W. Thies, and S. P. Amarasinghe. Exploiting Coarse-grained Task, Data, and Pipeline Parallelism in Stream Programs. In International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS), pages 151--162, 2006. Google ScholarDigital Library
M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffman, D. Maze, and S. P. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS), pages 291--303, 2002. Google ScholarDigital Library
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In IEEE Annual Workshop on Workload Characterization, pages 83--94, 2001. Google ScholarDigital Library
M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008. Google ScholarDigital Library
H. P. Hofstee. Power Efficient Processor Architecture and the CELL Processor. In International Symposium on High-Performance Computer Architectures (HPCA), pages 258--262, 2005. Google ScholarDigital Library
J. B. Kam and J. D. Ullman. Monotone Data Flow Analysis Frameworks. Acta Informatica, 7(3):305--317, 1977.Google ScholarDigital Library
M. Karczmarek, W. Thies, and S. P. Amarasinghe. Phased Scheduling of Stream Programs. In Conference on Language, Compiler, and Tool Support for Embedded Systems (LCTES), pages 103--112, 2003. Google ScholarDigital Library
G. Karypis and V. Kumar. METIS: Unstructured Graph Partitioning and Sparse Matrix Ordering System. Technical report, Department of Computer Science, University of Minnesota, 1995.Google Scholar
B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, A. Chang, and S. Rixner. Imagine: Media Processing with Streams. IEEE Micro, 21(2):35--46, 2001. Google ScholarDigital Library
M.-Y. Ko, C.-C. Shen, and S. S. Bhattacharyya. Memory-constrained Block Processing for DSP Software Optimization. Journal of Signal Processing Systems, 50(2):163--177, 2008. Google ScholarDigital Library
M. Kudlur and S. Mahlke. Orchestrating the Execution of Stream Programs on Multicore Platforms. In Conference on Programming Language Design and Implementation (PLDI), pages 114--124, 2008. Google ScholarDigital Library
K. N. Lalgudi, M. C. Papaefthymiou, and M. Potkonjak. Optimizing Computations for Effective Block-Processing. ACM Transactions on Design Automation of Electronic Systems, 5(3):604--630, 2000. Google ScholarDigital Library
M. Lam. Software Pipelining: An Effective Scheduling Technique on VLIW Machines. In Conference on Programming Language Design and Implementation (PLDI), pages 318--328, 1988. Google ScholarDigital Library
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization (CGO), pages 75--86, 2004. Google ScholarDigital Library
E. A. Lee. A Coupled Hardware and Software Architecture for Programmable Digital Signal Processors. PhD thesis, University of California, Berkeley, 1986. Google ScholarDigital Library
E. A. Lee and D. G. Messerschmitt. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Transactions on Computers, 36(1):24--35, 1987. Google ScholarDigital Library
T. Lengauer and R. E. Tarjan. A Fast Algorithm for Finding Dominators in Flowgraph. ACM Transactions on Programming Language and Systems (TOPLAS), 1(1):121--141, 1979. Google ScholarDigital Library
Y. Lin, M. Kudlur, S. Mahlke, and T. Mudge. Hierarchical Coarse-grained Stream Compilation for Software Defined Radio. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), pages 115--124, 2007. Google ScholarDigital Library
P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2002. Google ScholarDigital Library
A. V. Oppenheim, A. S. Willsky, and S. H. Nawab. Signals & Systems. Prentice Hall, 1997. Google ScholarDigital Library
J. Park, J. Balfour, and W. J. Dally. Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning. Technical Report 126, Concurrent VLSI Architecture Group, Stanford University, 2009.Google Scholar
J. L. Pino, S. S. Bhattacharyya, and E. A. Lee. A Hierarchical Multiprocessor Scheduling Systems for DSP Applications. Technical Report UCB/ERL M95/36, University of California, Berkeley, 1995. Google ScholarDigital Library
B. R. Rau. Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. In International Symposium on Microarchitecture (MICRO), pages 63--74, 1994. Google ScholarDigital Library
S. Ritz, M. Pankert, V. ÇZivojnović, and H. Meyr. Optimum Vectorization of Scalable Synchronous Dataflow Graphs. In International Conference on Application-Specific Array Processors (ASAP), pages 285--296, 1993.Google Scholar
S. Sriram and S. S. Bhattacharyya. Embedded Mutiprocesors: Scheduling and Synchronization. CRC, 2009. Google ScholarDigital Library
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction (CC), pages 179--196, 2002. Google ScholarDigital Library
Y.-P. E. Wang and T. Ottosson. Cell Search in W-CDMA. IEEE Journal on Selected Areas in Communications, 18(8):1470--1482, 2000. Google ScholarDigital Library

Index Terms

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Dynamic scheduling of stream programs on embedded multi-core processors
CODES+ISSS '12: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Stream computing has emerged as an important model of computation for embedded system applications particularly in the multimedia and network processing domains. In recent past several programming languages and embedded multi-core processors have been ...
Read More
Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach
SCOPES '15: Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems

Stream programming based on the synchronous data flow (SDF) model naturally exposes data, task and pipeline parallelism. Statically scheduling stream programs for homogeneous architectures has been an area of extensive research. With graphic processing ...
Read More
Synergistic execution of stream programs on multicores with accelerators
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems

The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multicore architectures. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on accelerators such as ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
June 2010
378 pages
ISBN:9781450300797
DOI:10.1145/1810479
General Chairs:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany
,
Cynthia Phillips
Sandia National Laboratories, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compiler and tools for concurrent programming
green computing and power-efficient architectures
multi-core architectures
stream programming
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate447of1,461submissions,31%
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 429
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic scheduling of stream programs on embedded multi-core processors

Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach

Synergistic execution of stream programs on multicores with accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Dynamic scheduling of stream programs on embedded multi-core processors

Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach

Synergistic execution of stream programs on multicores with accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media