skip to main content
10.1145/1810479.1810481acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Published:13 June 2010Publication History

ABSTRACT

We present a scheduling algorithm of stream programs for multi-core architectures called team scheduling. Compared to previous multi-core stream scheduling algorithms, team scheduling achieves 1) similar synchronization overhead, 2) coverage of a larger class of applications, 3) better control over buffer space, 4) deadlock-free feedback loops, and 5)lower latency. We compare team scheduling to the latest stream scheduling algorithm, sgms, by evaluating 14 applications on a multi-core architecture with 16 cores. Team scheduling successfully targets applications that cannot be validly scheduled by sgms due to excessive buffer requirement or deadlocks in feedback loops (e.g., gsm and w-cdma). For applications that can be validly scheduled by sgms, team scheduling shows on average 37% higher throughput within the same buffer space constraints.

References

  1. J. Balfour, W. J. Dally, D. Black-Schaffer, V. Parikh, and J. Park. An Energy-Efficient Processor Architecture for Embedded Systems. Computer Architecture Letters, 7(1):29--32, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bellman. On a Routing Problem. Quarterly of Applied Mathematics, 16(1):87--90, 1958.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. S. Bhattacharyya, J. T. Buck, S. Ha, and E. A. Lee. Generating Compact Code from Dataflow Specifications of Multirate Signal Processing Algorithms. IEEE Transactions on Circuits and Systems, 42(3):138--150, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations. Design Automation for Embedded Systems, 2(1):33--60, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. S. Bhattacharyya, S. Sriram, and E. A. Lee. Optimizing Synchronization in Multiprocessor DSP Systems. IEEE Transactions on Signal Processing, 45(6):1605--1618, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. ACM Transactions on Programming Language and Systems (TOPLAS), 13(4):451--490, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. ELM Webpage. Concurrent VLSI Architecture Group, Stanford University. http://cva.stanford.edu/projects/elm/software.htm.Google ScholarGoogle Scholar
  9. K. Fatahalian, T. J. Knight, M. Houston, M. Erez, D. R. Horn, L. Leem, J. Y. Park, M. Ren, A. Aiken, W. J. Dally, and P. Hanrahan. Sequoia: Programming the Memory Hierarchy. In Conference on Supercomputing (SC), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. I. Gordon, W. Thies, and S. P. Amarasinghe. Exploiting Coarse-grained Task, Data, and Pipeline Parallelism in Stream Programs. In International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS), pages 151--162, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffman, D. Maze, and S. P. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. In International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS), pages 291--303, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In IEEE Annual Workshop on Workload Characterization, pages 83--94, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. P. Hofstee. Power Efficient Processor Architecture and the CELL Processor. In International Symposium on High-Performance Computer Architectures (HPCA), pages 258--262, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. B. Kam and J. D. Ullman. Monotone Data Flow Analysis Frameworks. Acta Informatica, 7(3):305--317, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Karczmarek, W. Thies, and S. P. Amarasinghe. Phased Scheduling of Stream Programs. In Conference on Language, Compiler, and Tool Support for Embedded Systems (LCTES), pages 103--112, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Karypis and V. Kumar. METIS: Unstructured Graph Partitioning and Sparse Matrix Ordering System. Technical report, Department of Computer Science, University of Minnesota, 1995.Google ScholarGoogle Scholar
  18. B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong, J. D. Owens, B. Towles, A. Chang, and S. Rixner. Imagine: Media Processing with Streams. IEEE Micro, 21(2):35--46, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M.-Y. Ko, C.-C. Shen, and S. S. Bhattacharyya. Memory-constrained Block Processing for DSP Software Optimization. Journal of Signal Processing Systems, 50(2):163--177, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kudlur and S. Mahlke. Orchestrating the Execution of Stream Programs on Multicore Platforms. In Conference on Programming Language Design and Implementation (PLDI), pages 114--124, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. N. Lalgudi, M. C. Papaefthymiou, and M. Potkonjak. Optimizing Computations for Effective Block-Processing. ACM Transactions on Design Automation of Electronic Systems, 5(3):604--630, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Lam. Software Pipelining: An Effective Scheduling Technique on VLIW Machines. In Conference on Programming Language Design and Implementation (PLDI), pages 318--328, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization (CGO), pages 75--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. A. Lee. A Coupled Hardware and Software Architecture for Programmable Digital Signal Processors. PhD thesis, University of California, Berkeley, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. A. Lee and D. G. Messerschmitt. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Transactions on Computers, 36(1):24--35, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Lengauer and R. E. Tarjan. A Fast Algorithm for Finding Dominators in Flowgraph. ACM Transactions on Programming Language and Systems (TOPLAS), 1(1):121--141, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Lin, M. Kudlur, S. Mahlke, and T. Mudge. Hierarchical Coarse-grained Stream Compilation for Software Defined Radio. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), pages 115--124, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Mattson. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. V. Oppenheim, A. S. Willsky, and S. H. Nawab. Signals & Systems. Prentice Hall, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Park, J. Balfour, and W. J. Dally. Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning. Technical Report 126, Concurrent VLSI Architecture Group, Stanford University, 2009.Google ScholarGoogle Scholar
  31. J. L. Pino, S. S. Bhattacharyya, and E. A. Lee. A Hierarchical Multiprocessor Scheduling Systems for DSP Applications. Technical Report UCB/ERL M95/36, University of California, Berkeley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. R. Rau. Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. In International Symposium on Microarchitecture (MICRO), pages 63--74, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Ritz, M. Pankert, V. ÇZivojnović, and H. Meyr. Optimum Vectorization of Scalable Synchronous Dataflow Graphs. In International Conference on Application-Specific Array Processors (ASAP), pages 285--296, 1993.Google ScholarGoogle Scholar
  34. S. Sriram and S. S. Bhattacharyya. Embedded Mutiprocesors: Scheduling and Synchronization. CRC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction (CC), pages 179--196, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y.-P. E. Wang and T. Ottosson. Cell Search in W-CDMA. IEEE Journal on Selected Areas in Communications, 18(8):1470--1482, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader