skip to main content
10.1145/1810085.1810102acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

Published:02 June 2010Publication History

ABSTRACT

How to develop efficient and scalable parallel applications is the key challenge for emerging many-core architectures. We investigate this question by implementing and comparing two parallel H.264 decoders on the Cell architecture. It is expected that future many-cores will use a Cell-like local store memory hierarchy, rather than a non-scalable shared memory. The two implemented parallel algorithms, the Task Pool (TP) and the novel Ring-Line (RL) approach, both exploit macroblock-level parallelism. The TP implementation follows the master-slave paradigm and is very dynamic so that in theory perfect load balancing can be achieved. The RL approach is distributed and more predictable in the sense that the mapping of macroblocks to processing elements is fixed. This allows to better exploit data locality, to overlap communication with computation, and to reduce communication and synchronization overhead. While TP is more scalable in theory, the actual scalability favors RL. Using 16 SPEs, RL obtains a scalability of 12x, while TP achieves only 10.3x. More importantly, the absolute performance of RL is much higher. Using 16 SPEs, RL achieves a throughput of 139.6 frames per second (fps) while TP achieves only 76.6 fps. A large part of the additional performance advantage is due to hiding the memory latency. From the results we conclude that in order to fully leverage the performance of future many-cores, a centralized master should be avoided and the mapping of tasks to cores should be predictable in order to be able to hide the memory latency.

References

  1. International Standard of Joint Video Specification (ITU-T Rec. H.264| ISO/IEC 14496-10 AVC), 2005.Google ScholarGoogle Scholar
  2. M. Alvarez, A. Ramirez, A. Azevedo, C. Meenderinck, B. Juurlink, and M. Valero. Scalability of Macroblock-level Parallelism for H.264 Decoding. In Proc. Int. Conf. on Parallel and Distributed Systems, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Alvarez, E. Salami, A. Ramirez, and M. Valero. HD-VideoBench: A Benchmark for Evaluating High Definition Digital Video Applications. In Proc. IEEE Int. Symp. on Workload Characterization, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Baik, K. Sihn, Y. Kim, S. Bae, N. Han, and H. Song. Analysis and Parallelization of H.264 Decoder on Cell Broadband Engine Architecture. In Proc. Int. Symp. on Signal Processing and Information Technology. Samsung Electron. Co., 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Baker, P. Dalale, K. Chatha, and S. Vrudhula. A Scalable Parallel H.264 Decoder on the Cell Broadband Engine Architecture. In Proc. IEEE/ACM Int. Conf. on Hardware/Software Codesign and System Synthesis, volume 7, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation: a Performance View. IBM Journal of Research and Development, 51(5), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chen, X. Tian, S. Ge, and M. Girkar. Towards Efficient Multi-Level Threading of H.264 Encoder on Intel Hyper-Threading Architectures. In Proc. Int. Parallel and Distributed Processing Symposium, volume 18, 2004.Google ScholarGoogle Scholar
  8. The FFmpeg Libavcodec. http://ffmpeg.org.Google ScholarGoogle Scholar
  9. A. Gulati and G. Campbell. Efficient Mapping of the H.264 Encoding Algorithm onto Multiprocessor DSPs. In Proc. SPIE Conf. on Embedded Processors for Multimedia and Communications, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. Hoogerbrugge and A. Terechko. A Multithreaded Multicore System for Embedded Media Processing. Transactions on High-Performance Embedded Architectures and Compilers, 3(2), 2008.Google ScholarGoogle Scholar
  11. F. Khunjush and N. Dimopoulos. Extended Characterization of DMA Transfers on the Cell BE processor. In Proc. 13th Int. Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS-08), held in conjunction with IPDPS, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Meenderinck, A. Azevedo, B. Juurlink, M. Alvarez Mesa, and A. Ramirez. Parallel Scalability of Video Decoders. Journal of Signal Processing Systems, 57(2), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Oelbaum, V. Baroncini, T. Tan, and C. Fenimore. Subjective Quality Assessment of the Emerging AVC/H.264 Video Coding Standard. In Proc. Int. Broadcast Conf., 2004.Google ScholarGoogle Scholar
  14. D. Pham et al. The Design and Implementation of a First-Generation CELL Processor. In Proc. IEEE Int. Solid-State Circuits Conference (ISSCC), 2005.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Rodriguez, A. Gonzalez, and M. Malumbres. Hierarchical Parallelization of an H.264/AVC Video Encoder. In Proc. Int. Symp. on Parallel Computing in Electrical Engineering, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Roitzsch. Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding. In Proc. IEEE Real-Time Systems Symposium, volume 27, 2006.Google ScholarGoogle Scholar
  17. E. van der Tol, E. Jaspers, and R. Gelderblom. Mapping of H.264 Decoding on a Multiprocessor Architecture. In Proc. SPIE Conf. on Image and Video Communications and Processing, 2003.Google ScholarGoogle Scholar
  18. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560--576, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X264. A Free H.264/AVC Encoder. http://www.videolan.org/developers/x264.html.Google ScholarGoogle Scholar
  20. L. Zhao, R. Iyer, S. Makineni, J. Moses, R. Illikkal, and D. Newell. Performance, Area and Bandwidth Implications on Large-Scale CMP Cache Design. Proc. Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.Google ScholarGoogle Scholar
  21. X. Zhou, E. Q. Li, and Y.-K. Chen. Implementation of H.264 Decoder on General-Purpose Processors with Media Instructions. In Proc. SPIE Conf. on Image and Video Communications and Processing, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing
          June 2010
          365 pages
          ISBN:9781450300186
          DOI:10.1145/1810085

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 June 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader