Article

Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Authors:

Michael K. Chen,

Roy JuAuthors Info & Claims

PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Pages 224 - 236

https://doi.org/10.1145/1065010.1065038

Published: 12 June 2005 Publication History

Abstract

Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered.This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce per-packet memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor's heterogeneous memory hierarchy.Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-Switch, MPLS and Firewall.

References

[1]

Amaral, J.N., Gao, G.R., Dehnert, J. and Towle, R. The SGI Pro64 Compiler Infrastructure: A Tutorial. In PACT'00, Philadelphia, PA, October 2000.]]

[2]

Avissar, O., Barua, R. and Stewart., D. An optimal memory allocation scheme for scratch-pad-based embedded systems. In ACT Transactions on Embedded Computing Systems (TECS), 1(1) pp. 6--26, November 2002.]]

Digital Library

[3]

Baer, J.L., Low, D., Crowley, P. and Sidhwaney, N. Memory Hierarchy Design for a Multiprocessor Look-up Engine. In PACT'03, New Orleans, LA, September 2003.]]

Digital Library

[4]

Broadcom Corporation. The Sibyte BCM1250 Processor. http://sibyte.broadcom.com/public/index.html]]

[5]

Chen, B. and Morris, R. Flexible Control of Parallelism in a Multiprocessor PC Router. In USENIX 2001 Annual Technical Conference, Boston, MA, June 2001.]]

Digital Library

[6]

Chiueh, T. and Pradhan, P. High-performance IP routing table lookup using CPU caching. In IEEE Infocom'99, New York, NY, March 1999.]]

[7]

Chow, F., Chan, S., Kennedy, R., Liu, S.M., Lo, R. and Tu, P. A new algorithm for partial redundancy elimination based on SSA form. In PLDI'97, Las Vegas, NV, June 1997.]]

Digital Library

[8]

Cooper, K. and Harvey, T. Compiler-Controlled Memory. In ASPLOS-VIII, San Jose, CA, October 1998.]]

Digital Library

[9]

Davidson, J. and Jinturkar, S. Memory Access Coalescing: A Technique for Eliminating Redundant Memory Accesses. In PLDI'94, Orlando, FL, June 1994.]]

Digital Library

[10]

Dai, J., Huang, B., Li, L. and Harrison, L. Automatically Partitioning Packet Processing Applications for Pipelined Architectures. To appear in PLDI'05, Chicago, IL, June 2005.]]

Digital Library

[11]

Diwan, A., McKinley, K. and Moss, E. Type-Based Alias Analysis. In PLDI'98, Montreal, Canada, June 1998.]]

Digital Library

[12]

George, L. and Blume, M. Taming the IXP Network Processor. In PLDI'03, San Diego, CA, June 2003.]]

Digital Library

[13]

Goglin, S., Johnson, E.J. and Vin, H. Baker: A Packet Processing Programming Language for Highly Concurrent Hardware. Under preparation for submission.]]

[14]

Gupta, R., Mehofer, E. and Zhang, Y. A Representation for Bit Section based Analysis and Optimization. In International Conference on Compiler Construction, Grenoble, France, April 2002.]]

Digital Library

[15]

IBM. The PowerNP architecture. http://www.hifn.com/products/5np4g.html.]]

[16]

Intel Corporation. Intel IXP2400 Network Processor: Hardware Reference Manual. October 2002.]]

[17]

Intel Corporation. Microengine Version 2 (MEv2): Microengine C Compiler Coding Considerations. June 2003.]]

[18]

Johnson, E.J. and Kunze, A. IXP2400/2800 Programming: The Complete Microengine Coding Guide. Intel Press, Hillsboro, OR, April 2003.]]

Digital Library

[19]

Ju, R., Chan, S. and Wu, Chengyong. Open Research Compiler for Itanium Processor Family. Tutorial in MICRO-34, Austin, TX, December 2001.]]

[20]

Kohler, E., Morris, R., Chen, B., Jannotti, J. and Kaashoek, M.F. The Click Modular Router. In ACM TCS, 18(3) pp. 263--297, August 2000.]]

Digital Library

[21]

Kohler, E., Morris, R. and Chen, B. Programming language optimizations for modular router configurations. In ASPLOS-X, San Jose, CA October 2002.]]

Digital Library

[22]

Kim, J., Jung, S. and Park, Y. Experiences with a Retargetable Compiler for a Commercial Network Processor. In CASES'02, Grenoble, France, October 2003.]]

Digital Library

[23]

Kulkarni, C., Gries, M., Sauer, C. and Keutzer, K. Programming Challenges in Network Processor Deployment. In CASES'03, San Jose, CA, October 2003.]]

Digital Library

[24]

Li, B. and Gupta, R. Simple Offset Assignment in Presence of Subword Data. In CASES'03, San Jose, CA, October 2003.]]

Digital Library

[25]

Narlikar, G. and Zane, F. Performance Modeling for Fast IP Lookups. In SIGMETRICS'01, Cambridge, MA, June 2001.]]

Digital Library

[26]

Intel Corporation. Microengine Version 2 (MEv2): Microengine C Compiler Coding Considerations. June 2003.]]

[27]

Network Processing Forum. IP Forwarding Application Level Benchmark. http://www.npforum.org/techinfo/ipforwarding_bm.pdf]]

[28]

Network Processing Forum. MPLS Forwarding Application Level Benchmark and Annex. http://www.npforum.org/techinfo/MPLSBenchmark.pdf]]