skip to main content
10.1145/1065010.1065038acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article

Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Published: 12 June 2005 Publication History

Abstract

Programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programming languages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered.This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce per-packet memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor's heterogeneous memory hierarchy.Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-Switch, MPLS and Firewall.

References

[1]
Amaral, J.N., Gao, G.R., Dehnert, J. and Towle, R. The SGI Pro64 Compiler Infrastructure: A Tutorial. In PACT'00, Philadelphia, PA, October 2000.]]
[2]
Avissar, O., Barua, R. and Stewart., D. An optimal memory allocation scheme for scratch-pad-based embedded systems. In ACT Transactions on Embedded Computing Systems (TECS), 1(1) pp. 6--26, November 2002.]]
[3]
Baer, J.L., Low, D., Crowley, P. and Sidhwaney, N. Memory Hierarchy Design for a Multiprocessor Look-up Engine. In PACT'03, New Orleans, LA, September 2003.]]
[4]
Broadcom Corporation. The Sibyte BCM1250 Processor. http://sibyte.broadcom.com/public/index.html]]
[5]
Chen, B. and Morris, R. Flexible Control of Parallelism in a Multiprocessor PC Router. In USENIX 2001 Annual Technical Conference, Boston, MA, June 2001.]]
[6]
Chiueh, T. and Pradhan, P. High-performance IP routing table lookup using CPU caching. In IEEE Infocom'99, New York, NY, March 1999.]]
[7]
Chow, F., Chan, S., Kennedy, R., Liu, S.M., Lo, R. and Tu, P. A new algorithm for partial redundancy elimination based on SSA form. In PLDI'97, Las Vegas, NV, June 1997.]]
[8]
Cooper, K. and Harvey, T. Compiler-Controlled Memory. In ASPLOS-VIII, San Jose, CA, October 1998.]]
[9]
Davidson, J. and Jinturkar, S. Memory Access Coalescing: A Technique for Eliminating Redundant Memory Accesses. In PLDI'94, Orlando, FL, June 1994.]]
[10]
Dai, J., Huang, B., Li, L. and Harrison, L. Automatically Partitioning Packet Processing Applications for Pipelined Architectures. To appear in PLDI'05, Chicago, IL, June 2005.]]
[11]
Diwan, A., McKinley, K. and Moss, E. Type-Based Alias Analysis. In PLDI'98, Montreal, Canada, June 1998.]]
[12]
George, L. and Blume, M. Taming the IXP Network Processor. In PLDI'03, San Diego, CA, June 2003.]]
[13]
Goglin, S., Johnson, E.J. and Vin, H. Baker: A Packet Processing Programming Language for Highly Concurrent Hardware. Under preparation for submission.]]
[14]
Gupta, R., Mehofer, E. and Zhang, Y. A Representation for Bit Section based Analysis and Optimization. In International Conference on Compiler Construction, Grenoble, France, April 2002.]]
[15]
IBM. The PowerNP architecture. http://www.hifn.com/products/5np4g.html.]]
[16]
Intel Corporation. Intel IXP2400 Network Processor: Hardware Reference Manual. October 2002.]]
[17]
Intel Corporation. Microengine Version 2 (MEv2): Microengine C Compiler Coding Considerations. June 2003.]]
[18]
Johnson, E.J. and Kunze, A. IXP2400/2800 Programming: The Complete Microengine Coding Guide. Intel Press, Hillsboro, OR, April 2003.]]
[19]
Ju, R., Chan, S. and Wu, Chengyong. Open Research Compiler for Itanium Processor Family. Tutorial in MICRO-34, Austin, TX, December 2001.]]
[20]
Kohler, E., Morris, R., Chen, B., Jannotti, J. and Kaashoek, M.F. The Click Modular Router. In ACM TCS, 18(3) pp. 263--297, August 2000.]]
[21]
Kohler, E., Morris, R. and Chen, B. Programming language optimizations for modular router configurations. In ASPLOS-X, San Jose, CA October 2002.]]
[22]
Kim, J., Jung, S. and Park, Y. Experiences with a Retargetable Compiler for a Commercial Network Processor. In CASES'02, Grenoble, France, October 2003.]]
[23]
Kulkarni, C., Gries, M., Sauer, C. and Keutzer, K. Programming Challenges in Network Processor Deployment. In CASES'03, San Jose, CA, October 2003.]]
[24]
Li, B. and Gupta, R. Simple Offset Assignment in Presence of Subword Data. In CASES'03, San Jose, CA, October 2003.]]
[25]
Narlikar, G. and Zane, F. Performance Modeling for Fast IP Lookups. In SIGMETRICS'01, Cambridge, MA, June 2001.]]
[26]
Intel Corporation. Microengine Version 2 (MEv2): Microengine C Compiler Coding Considerations. June 2003.]]
[27]
Network Processing Forum. IP Forwarding Application Level Benchmark. http://www.npforum.org/techinfo/ipforwarding_bm.pdf]]
[28]
Network Processing Forum. MPLS Forwarding Application Level Benchmark and Annex. http://www.npforum.org/techinfo/MPLSBenchmark.pdf]]