skip to main content
10.1145/1023833.1023867acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Automatic data partitioning for the agere payload plus network processor

Published: 22 September 2004 Publication History

Abstract

With the ever-increasing pervasiveness of the Internet and its stringent performance requirements, network system designers have begun utilizing specialized chips to increase the performance of network functions. To increase performance, many more advanced functions, such as traffic shaping and policing, are being implemented at the network interface layer to reduce delays that occur when these functions are handled by a general-purpose CPU. While some designs use ASICs to handle network functions, many system designers have moved toward using programmable network processors due to their increased exibility and lower design cost. In this paper, we describe a code generation technique designed for the Agere Payload Plus network processor. This processor utilizes a multi-block pipeline containing a Fast Pattern Processor (FPP) for classification, a Routing Switch Processor (RSP) for traffic management and a third block, the Agere Systems Interface (ASI), which provides additional functionality for performance. This paper focuses on code generation for the clustered VLIW compute engines on the RSP. Currently, due to the real-time nature of the applications run on the APP, the programmer must lay out and partition the application-specific data by hand to get good performance.The major contribution of this paper is to remove the need for hand partitioning for the RSP compute engines. We propose both a greedy code-generation approach that achieves harmonic mean performance equal to code that has been hand partitioned by an application programmer and a genetic algorithm that achieves a harmonic mean speedup of 1.08 over the same hand-partitioned code. Achieving harmonic mean performance that is equal to or better than hand partitioning removes the need to hand code for performance. This allows the programmer to spend more time on algorithm development.

References

[1]
A. Aletá, J. Codina, J. Sánchez, A. González, and D. Kaeli. Exploiting pseudo-schedules to guide data dependence graph partitioning. In Procedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, Charlottesville, VA, September 2002.
[2]
V. Allan, S. J. Beaty, B. Su, and P. H. Sweany. Building a retargetable local instruction scheduler. Software Practice and Experience, 28(3):249--284, March 1998.
[3]
N. C. Audsley, A. Burns, M. F. Richardson, and A. J. Wellings. Hard real-time scheduling: The deadline monotonic approach. In Proceedings 8th IEEE Workshop on Real-Time Operating Systems and Software, Atlanta, GA, 1991.
[4]
G. Desoli. Instruction assignment for clustered VLIW DSP compilers: A new approach. HP Labs Technical Report HPL-98-13, HP Labs, Jan. 1998.
[5]
J. R. Ellis. A Compiler for VLIW Architectures. PhD thesis, Yale University, 1984.
[6]
L. George and M. Blume. Taming the IXP network processor. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 26--37, San Diego, CA, June 2003.
[7]
J. Hiser, S. Carr, and P. Sweany. Global register partitioning. In Proceedings of the 2000 International Conference on Parallel Architectures and Compiler Techniques, pages 13--23, Philadelphia, PA, October 2000.
[8]
J. Hiser, S. Carr, P. Sweany, and S. Beaty. Register partitioning for software pipelining with partitioned register banks. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 211--218, Cancun, Mexico, May 2000.
[9]
R. Leupers. Instruction scheduling for clustered VLIW DSPs. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques, pages 291--300, Philadelphia, PA, Oct. 2000.
[10]
E. Nystrom and A. Eichenberger. Effective cluster assignment for modulo scheduling. In Proceedings of the 31st International Symposium on Microarchitecture (MICRO-31), pages 103--114, Dallas, TX, December 1998.
[11]
E. Özer, S. Banerjia, and T. Conte. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st International Symposium on Microarchitecture (MICRO-31), pages 308--316, Dallas, TX, December 1998.
[12]
V. Rayward-Smith, I. Osman, C. Reeves, and G. Smith. Modern Heuristic Search Methods. John Wiley and Sons, Ltd., 1996.
[13]
D. Sule, P. Sweany, and S. Carr. Evaluating register partitioning with genetic algorithms. In Proceedings of the Fourth International Conference on Massively Parallel Computing Systems, Ischia, Italy, April 2002.
[14]
D. Whitley. An overview of evolutionary algorithms. Journal of Information and Software Technology, 43:817--831, 2001.
[15]
X. Zhuang and S. Pande. Resolving register bank conflicts for a network processor. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, pages 269--278, New Orleans, LA, September 2003.

Cited By

View all
  • (2021)Fine-grained pipeline parallelization for network function programsProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370309(162-173)Online publication date: 27-Feb-2021
  • (2007)RulerProceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems10.1145/1323548.1323550(1-10)Online publication date: 3-Dec-2007

Index Terms

  1. Automatic data partitioning for the agere payload plus network processor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CASES '04: Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
    September 2004
    324 pages
    ISBN:1581138903
    DOI:10.1145/1023833
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 September 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. network processors
    2. partitioning
    3. scheduling

    Qualifiers

    • Article

    Conference

    CASES04

    Acceptance Rates

    Overall Acceptance Rate 52 of 230 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Fine-grained pipeline parallelization for network function programsProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370309(162-173)Online publication date: 27-Feb-2021
    • (2007)RulerProceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems10.1145/1323548.1323550(1-10)Online publication date: 3-Dec-2007

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media