skip to main content
10.1145/1084834.1084906acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Designing real-time H.264 decoders with dataflow architectures

Published: 19 September 2005 Publication History

Abstract

High performance microprocessors are designed with general-purpose applications in mind. When it comes to embedded applications, these architectures typically perform control-intensive tasks in a System-on-Chip (SoC) design. But they are significantly inefficient for data-intensive tasks such as video encoding/decoding. Although configurable processors fill this gap by complementing the existing functional units with instruction extensions, their performance lags behind the needs of real-time embedded tasks. In this paper, we evaluate the performance potential of a dataflow processor for H.264 video decoding. We first profile the H.264 application to capture the amount of data traffic among modules. We use this information to guide the placement of H.264 modules in the WaveScalar dataflow architecture. A simulated annealing based placement algorithm produces the final placement aiming to optimize the communication costs between the modules in the dataflow architecture. In addition to outperforming contemporary embedded and customized processors, our simulated annealing guided design shows a speedup of 13% in execution time over the original WaveScalar architecture. With our dataflow design methodology, emerging embedded applications requiring several GOPS to meet real-time constraints can be drafted within a reasonable amount of design time.

References

[1]
ITRS 2003-2018 Roadmap - System Functional Requirements For Handheld Wireless Low Power SoC
[2]
H. Singh, Lee Ming-Hau, Lu Guangming, F. J. Kurdahi, N. Bagherzadeh and E. M. Chaves Filho, MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications, IEEE Transactions on Computers, Volume 49, pp. 465--481, 2004.
[3]
B. Mei, S. Vernalde, D. Verkest and R. Lauwereins, Design methodology for a tightly coupled VLIW/reconfigurable matrix architecture: a case study, in Proc. DATE, pp. 1224-1229, 2004.
[4]
A. Hoffmann, A, T. Kogel, A. Nohl, G. Braun, O. Schliebusch, O. Wahlen, A. Wieferink, H. Meyr, A novel methodology for the design of application-specific instruction-set processors (ASIPs) using a machine description language, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 20, Issue 11, pp. 1338--1354, 2004.
[5]
I. Park, S. Kang and Y Yi, Fast cycle-accurate Behavioral Simulation for Pipelined Processors Using Early Pipeline Evaluation," International Conference on Computer Aided Design, pp. 138-141, Nov, 2003
[6]
K. Atasu, L. Pozzi, and P. Ienne, Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints, In the Proceedings of 40th DAC Design Automation Conference, Los Angeles, June 2003.
[7]
N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization, International Symposium on Microarchitecture (MICRO-37), pp. 30--40, December 2004.
[8]
C. Rowen and S. Leibson, Flexible Architectures for Engineering Successful SOCs, In the Proceedings of 41st Conference on Design Automation Conference, pp. 692--697. 2004.
[9]
Tensilica web page, http://www.tensilica.com/
[10]
ARC website, http://www.arc.com
[11]
S. Swanson, K. Michelson, A. Schwerin and M. Oskin, WaveScalar In the 36th Annual International Symposium on Microarchitecture (MICRO-36), December 2003
[12]
H.264 TML Model, http://bs.hhi.de/ suehring/tml/
[13]
S. Saponara and C. Blanch, K. Denolf and J. Bormans, The JVT Advanced Video Coding Standard: Complexity And Performance Analysis On A Tool-by-tool Basis, ICIP Conference, 2002.
[14]
J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer and T. Wedi, Video coding with H.264/AVC: tools, performance, and complexity, IEEE Circuits and Systems Magazine, Vol. 4, Issue 1, pp. 7--28, 2004.
[15]
L. Pozzi, M. Vuletic, and P. Ienne, Automatic topology-based identification of instruction-set extensions for embedded processors. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, page 1138, Paris, March 2002.
[16]
N. Pazos, A. Maxiaguine, P. Ienne, and Y. Leblebici. Parallel modelling paradigm in multimedia applications: Mapping and scheduling onto a multi-processor system-on-chip platform. In Proceedings of the International Global Signal Processing Conference, Santa Clara, Calif., September 2004.
[17]
CACTI web page, http://research.compaq.com/wrl/people/jouppi/CACTI.html
[18]
D. Burger and T.M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report 1342, Computer Sciences Dept., University of Wisconsin-Madison, 1997.
[19]
A. Srivastava and A. Eustace, ATOM: A system for building customized program analysis tools. In Proceedings of the Conference on Programming Language Design and Implementation, pages 196--205. ACM, 1994.
[20]
CoWARE LisaTek Processor Designer Manual.

Cited By

View all
  • (2015)Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow AlgorithmACM Transactions on Reconfigurable Technology and Systems10.1145/27004178:3(1-24)Online publication date: 19-May-2015
  • (2013)A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfigurationJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2013.03.01059:7(482-491)Online publication date: 1-Aug-2013
  • (2011)Building a Multi-kernel Embedded System with High Performance IPC MechanismProceedings of the 2011 IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2011.72(506-511)Online publication date: 2-Sep-2011
  • Show More Cited By

Index Terms

  1. Designing real-time H.264 decoders with dataflow architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CODES+ISSS '05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
    September 2005
    356 pages
    ISBN:1595931619
    DOI:10.1145/1084834
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 September 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. H.264
    2. WaveScalar
    3. dataflow architecture

    Qualifiers

    • Article

    Conference

    CODES/ISSS05

    Acceptance Rates

    CODES+ISSS '05 Paper Acceptance Rate 50 of 200 submissions, 25%;
    Overall Acceptance Rate 280 of 864 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2015)Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow AlgorithmACM Transactions on Reconfigurable Technology and Systems10.1145/27004178:3(1-24)Online publication date: 19-May-2015
    • (2013)A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfigurationJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2013.03.01059:7(482-491)Online publication date: 1-Aug-2013
    • (2011)Building a Multi-kernel Embedded System with High Performance IPC MechanismProceedings of the 2011 IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2011.72(506-511)Online publication date: 2-Sep-2011
    • (2010)Building Multi-kernel Embedded System on PAC Multi-core PlatformProceedings of the 2010 10th International Conference on Quality Software10.1109/QSIC.2010.65(465-472)Online publication date: 14-Jul-2010
    • (2010)FFT Algorithms Evaluation on a Homogeneous Multi-processor System-on-ChipProceedings of the 2010 39th International Conference on Parallel Processing Workshops10.1109/ICPPW.2010.20(58-64)Online publication date: 13-Sep-2010
    • (2007)Chip multiprocessor based on data-driven multithreading modelInternational Journal of High Performance Systems Architecture10.1504/IJHPSA.2007.0132891:1(34-43)Online publication date: 1-Apr-2007
    • (2006)H.264 Video Decoder Design: Beyond RTL Design Implementation2006 IEEE Workshop on Signal Processing Systems Design and Implementation10.1109/SIPS.2006.352564(107-112)Online publication date: Oct-2006

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media