Abstract
In this paper we present a performance study of memory reference behavior in network protocol processing, using an Internet-based protocol stack implemented in the x-kernel running in user space on a MIPS R4400-based Silicon Graphics machine. We use the protocols to drive a validated execution-driven architectural simulator of our machine. We characterize the behavior of network protocol processing, deriving statistics such as cache miss rates and percentage of time spent waiting for memory. We also determine how sensitive protocol processing is to the architectural environment, varying factors such as cache size and associativity, and predict performance on future machines.We show that network protocol cache behavior varies widely, with miss rates ranging from 0 to 28 percent, depending on the scenario. We find instruction cache behavior has the greatest effect on protocol latency under most cases, and that cold cache behavior is very different from warm cache behavior. We demonstrate the upper bounds on performance that can be expected by improving memory behavior, and the impact of features such as associativity and larger cache sizes. In particular, we find that TCP is more sensitive to cache behavior than UDP, gaining larger benefits from improved associativity and bigger caches. We predict that network protocols will scale well with CPU speeds in the future.
- 1 Jean-Loup Baer and Wen-Hann Wang. On the inclusion property for multi-level cache hierarchies. In Proceedings 15th International Symposiumon ComputerArchitecture, pages 73-80, Honolulu Hawaii, June 1988.]] Google ScholarDigital Library
- 2 David Banks and Michael Prudence. A high-performance network architecture for a PA-RISC workstation. 1EEE Journal on Selected Areas in Communications, 11 (2):191-202, February 1993.]]Google Scholar
- 3 Robert C. Bedichek. Talisman: Fast and accurate multicomputer simulation. In Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 14-24, Ottawa, Canada, May 1995.]] Google ScholarDigital Library
- 4 Mats BjtJrkman and Per Gunningberg. Locking effects in multiprocessor implementations of protocols. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, pages 74-83, San Francisco, CA, September 1993.]] Google ScholarDigital Library
- 5 Trevor Blackwell. Speeding up protocols for small messages. In A CM SIGCOMM Symposium on Communications Architectures and Protocols, Stanford, CA, August 1996.]] Google ScholarDigital Library
- 6 Matthias A. Blumrich, Cezary Dubnicki, Edward W. Felton, Kai Li, and Malena R. Mesarina. Virtual-memory mapped interfaces. IEEE Micro, 15( 1 ):21-28, February 1995.]] Google ScholarDigital Library
- 7 D. Borman, R. Braden, and V. Jacobson. TCP extensions for high performance. Request for Comments (Proposed Standard) RFC 1323, Internet Engineering Task Force, May 1992.]]Google Scholar
- 8 Brad Calder, Dirk Grunwald, and Joel Emer. A system level perspective on branch architecture performance. In Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, pages 199-206, Ann Arbor, MI, November 1995.]] Google ScholarDigital Library
- 9 Hsiao-Keng Jerry Chu. Zero copy TCP in Solaris. In Proceedings of the Winter USENIX Technical Conference, San Diego, CA, January 1996.]] Google ScholarDigital Library
- 10 David D. Clark, Van Jacobson, John Romkey, and Howard Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6):23-29, June 1989.]]Google ScholarDigital Library
- 11 Chris Dalton, Greg Watson, David Banks, Costas Clamvokis. Aled Edwards, and John Lumley. Afterburner. IEEE Netw#rk, I 1(2):36-43. July 1993.]]Google Scholar
- 12 Amer Diwan, David Tarditi, and Eliot Moss, Memory-system performance of programs with intensive heap allocation. A CM Transactions on Computer Systems, 13(3):244-273, 1995.]] Google ScholarDigital Library
- 13 Peter Druschel, Larry Peterson, and Bruce Davie. Experiences with a high-speed network adaptor: A software perspective. In ACM SIG- COMM Symposium on C#mmunications Architectures and Prou#cols, London, England, August 1994.]] Google ScholarDigital Library
- 14 Peter Druschel and Larry L. Peterson. Fbufs: A high-bandwidth crossdomain transfer facility. In Pivceedings of the Fourteenth ACM Symposium on Operating Systems Principles, pages 189-202, Asheville, NC, Dec 1993.]] Google ScholarDigital Library
- 15 Aled Edwards and Steve Muir. Experiences implementing a highperformance TCP in user space. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, pages 196-205, Cambridge, MA, August 1995.]] Google ScholarDigital Library
- 16 Murray W. Goldberg, Gerald W. Neufeld, and Mabo R. Ito. A parallel approach to OSI connection-oriented protocols. Third IFIP WG6.1AVG6.4 International Workshop on Protocols for High-Speed Networks, pages 219-232, May 1993.]] Google ScholarDigital Library
- 17 John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach (2nd Edition). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1995.]] Google ScholarDigital Library
- 18 Mark D. Hill. A case for direct mapped caches. IEEE Computer, 21 ( 12):24-40, December 1988.]] Google ScholarDigital Library
- 19 Mark D. Hill and Alan .1. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612-1630, December 1989.]] Google ScholarDigital Library
- 20 Norman C. Hutchinson and Larry L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Transactions on Soil, are Engineering, 17( 1 ):64-76, January 1991.]] Google ScholarDigital Library
- 21 Van Jacobson. Efficient protocol implementation. In ACM SIGCOMM 1990 Tutorial Notes, Philadelphia, PA, September 1990.]]Google Scholar
- 22 Van Jacobson. A high performance TCP/IP implementation, in NRI Gigabit TCP Workshop, Reston, VA, March 1993.]]Google Scholar
- 23 Jonathan Kay and Joseph Pasquale. Measurement, analysis, and improvement of UDP/IP throughput for the DECStation 5000. in USENIX Winter 1993 Technical Conference, pages 249-258, San Diego, CA, 1993.]]Google Scholar
- 24 S. J. Lefller, M.K. McKusick, M.J. Karels, and J.S. Quarterman. The Design and Implementation of the 4.3BSD UNIX Operating System. Addison-Wesley, 1989.]] Google ScholarDigital Library
- 25 Larry McVoy and Carl Staelin. LMBENCH: Portable tools for performanee analysis, in USENIX Technical Conference of UNIX and Advanced Computing Systems, San Diego, CA, January 1996.]] Google ScholarDigital Library
- 26 Ron Minnich, Dan Bums, and Frank Hady. The memory-integrated network interface. IEEE Micro, 15( 1):11-20, February 1995.]] Google ScholarDigital Library
- 27 David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Scan O'- Malley. Analysis of techniques to improve protocol processing latency. In A CM SIGCOMM Symposium on Communications Architectures and Protocols, Stanford, CA, August 1996.]] Google ScholarDigital Library
- 28 B.J. Murphy, S. Zeadally, and C.J. Adams. An analysis of process and memory models to support high-speed networking in a UNIX environment. In Proceedings oj the Winter USENIX Technical Conference, San Diego, CA, January 1996.]] Google ScholarDigital Library
- 29 Erich M. Nahum. Validating an architectural simulator. Technical Report 96-40, Department of Computer Science, University of Massachusetts at Amherst, September 1996.]] Google ScholarDigital Library
- 30 Erich M Nahum, David J. Yates, James E Kurose. and Don Towsley. Performance issues in parallelized network protocols. In First USENIX Symposium on Operating Systems Design and Implementation, pages 125-137, Monterey, CA, November 1994.]] Google ScholarDigital Library
- 31 Karl Pettis and Robert C. Hansen. Profile guided code positioning. In ACM SIGPLAN '90 Conference on Programming Language Design and Implementation (PLDI), pages 16-27, White Plains, NY, June 1990.]] Google ScholarDigital Library
- 32 Mendel Rosenblum, Edouard Bugnion, Stephen A. Herrod, Emmett Witchell, and Anoop Gupta. The impact of computer architecture on operating system performance. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, Copper Canyon, CO, December 1995.]] Google ScholarDigital Library
- 33 James D. Salehi, James E Kurose, and Don Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing. IEEE/ACM Transactions on Networking, 4(4):516-530, August 1996.]] Google ScholarDigital Library
- 34 Douglas C. Schmidt and Tatsuya Suda. Measuring the performance of parallel message-based process architectures. In Proceedings of'the Conference on Computer Communications (IEEE Infocom), Boston, MA, April 1995.]] Google ScholarDigital Library
- 35 Silicon Graphics Inc. Cord manual page, IRIX 5.3.]]Google Scholar
- 36 Michael D. Smith. Tracing with Pixie. Technical report, Center for Integrated Systems, Stanford University, Stanford, CA, April 1991.]]Google Scholar
- 37 Steven E. Speer, Rajiv Kumar, and Craig Partridge. Improving UNIX kernel performance using profile based optimization. In Proceedings of the Winter 1994 USENIX Conference, pages 181-188, San Francisco, CA, January 1994.]] Google ScholarDigital Library
- 38 Jack E. Veenstra and Robert J. Fowler. MINT: A front end for efficient simulation of shared-memory mulUprocessors. In Proceedings 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Durham, NC, January 1994.]] Google ScholarDigital Library
- 39 David J. Yates, Erich M. Nahum, James E Kurose, and Don Towsley. Networking support for large scale multiprocessor servers. In Proceedings of the A CM Sigmetrics Conference on Measurement and Modeling of Computer Systems, Philadelphia, Pennsylvania, May 1996.]] Google ScholarDigital Library
Index Terms
- Cache behavior of network protocols
Recommendations
Cache behavior of network protocols
SIGMETRICS '97: Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systemsIn this paper we present a performance study of memory reference behavior in network protocol processing, using an Internet-based protocol stack implemented in the x-kernel running in user space on a MIPS R4400-based Silicon Graphics machine. We use the ...
Cache miss behavior: is it √2?
CF '06: Proceedings of the 3rd conference on Computing frontiersIt has long been empirically observed that the cache miss rate decreased as a power law of cache size, where the power was approximately -1/2. In this paper, we examine the dependence of the cache miss rate on cache size both theoretically and through ...
Simulation based Performance Study of Cache Coherence Protocols
INIS '15: Proceedings of the 2015 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS)Cache coherence protocol maintains data consistency between different cores / processors in a shared memory multi-core (MC) / multi-processor (MP) system. Coherency can be achieved at the cost of increased miss rate because of invalidations. Coherency ...
Comments