skip to main content
article
Free Access

Cache behavior of network protocols

Authors Info & Claims
Published:01 June 1997Publication History
Skip Abstract Section

Abstract

In this paper we present a performance study of memory reference behavior in network protocol processing, using an Internet-based protocol stack implemented in the x-kernel running in user space on a MIPS R4400-based Silicon Graphics machine. We use the protocols to drive a validated execution-driven architectural simulator of our machine. We characterize the behavior of network protocol processing, deriving statistics such as cache miss rates and percentage of time spent waiting for memory. We also determine how sensitive protocol processing is to the architectural environment, varying factors such as cache size and associativity, and predict performance on future machines.We show that network protocol cache behavior varies widely, with miss rates ranging from 0 to 28 percent, depending on the scenario. We find instruction cache behavior has the greatest effect on protocol latency under most cases, and that cold cache behavior is very different from warm cache behavior. We demonstrate the upper bounds on performance that can be expected by improving memory behavior, and the impact of features such as associativity and larger cache sizes. In particular, we find that TCP is more sensitive to cache behavior than UDP, gaining larger benefits from improved associativity and bigger caches. We predict that network protocols will scale well with CPU speeds in the future.

References

  1. 1 Jean-Loup Baer and Wen-Hann Wang. On the inclusion property for multi-level cache hierarchies. In Proceedings 15th International Symposiumon ComputerArchitecture, pages 73-80, Honolulu Hawaii, June 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 David Banks and Michael Prudence. A high-performance network architecture for a PA-RISC workstation. 1EEE Journal on Selected Areas in Communications, 11 (2):191-202, February 1993.]]Google ScholarGoogle Scholar
  3. 3 Robert C. Bedichek. Talisman: Fast and accurate multicomputer simulation. In Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 14-24, Ottawa, Canada, May 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Mats BjtJrkman and Per Gunningberg. Locking effects in multiprocessor implementations of protocols. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, pages 74-83, San Francisco, CA, September 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 Trevor Blackwell. Speeding up protocols for small messages. In A CM SIGCOMM Symposium on Communications Architectures and Protocols, Stanford, CA, August 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Matthias A. Blumrich, Cezary Dubnicki, Edward W. Felton, Kai Li, and Malena R. Mesarina. Virtual-memory mapped interfaces. IEEE Micro, 15( 1 ):21-28, February 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 D. Borman, R. Braden, and V. Jacobson. TCP extensions for high performance. Request for Comments (Proposed Standard) RFC 1323, Internet Engineering Task Force, May 1992.]]Google ScholarGoogle Scholar
  8. 8 Brad Calder, Dirk Grunwald, and Joel Emer. A system level perspective on branch architecture performance. In Proceedings of the 28th Annual IEEE/ACM International Symposium on Microarchitecture, pages 199-206, Ann Arbor, MI, November 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Hsiao-Keng Jerry Chu. Zero copy TCP in Solaris. In Proceedings of the Winter USENIX Technical Conference, San Diego, CA, January 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 David D. Clark, Van Jacobson, John Romkey, and Howard Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6):23-29, June 1989.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 Chris Dalton, Greg Watson, David Banks, Costas Clamvokis. Aled Edwards, and John Lumley. Afterburner. IEEE Netw#rk, I 1(2):36-43. July 1993.]]Google ScholarGoogle Scholar
  12. 12 Amer Diwan, David Tarditi, and Eliot Moss, Memory-system performance of programs with intensive heap allocation. A CM Transactions on Computer Systems, 13(3):244-273, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 Peter Druschel, Larry Peterson, and Bruce Davie. Experiences with a high-speed network adaptor: A software perspective. In ACM SIG- COMM Symposium on C#mmunications Architectures and Prou#cols, London, England, August 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 Peter Druschel and Larry L. Peterson. Fbufs: A high-bandwidth crossdomain transfer facility. In Pivceedings of the Fourteenth ACM Symposium on Operating Systems Principles, pages 189-202, Asheville, NC, Dec 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Aled Edwards and Steve Muir. Experiences implementing a highperformance TCP in user space. In ACM SIGCOMM Symposium on Communications Architectures and Protocols, pages 196-205, Cambridge, MA, August 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 Murray W. Goldberg, Gerald W. Neufeld, and Mabo R. Ito. A parallel approach to OSI connection-oriented protocols. Third IFIP WG6.1AVG6.4 International Workshop on Protocols for High-Speed Networks, pages 219-232, May 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach (2nd Edition). Morgan Kaufmann Publishers Inc., San Francisco, CA, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 Mark D. Hill. A case for direct mapped caches. IEEE Computer, 21 ( 12):24-40, December 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 Mark D. Hill and Alan .1. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612-1630, December 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 Norman C. Hutchinson and Larry L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Transactions on Soil, are Engineering, 17( 1 ):64-76, January 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 Van Jacobson. Efficient protocol implementation. In ACM SIGCOMM 1990 Tutorial Notes, Philadelphia, PA, September 1990.]]Google ScholarGoogle Scholar
  22. 22 Van Jacobson. A high performance TCP/IP implementation, in NRI Gigabit TCP Workshop, Reston, VA, March 1993.]]Google ScholarGoogle Scholar
  23. 23 Jonathan Kay and Joseph Pasquale. Measurement, analysis, and improvement of UDP/IP throughput for the DECStation 5000. in USENIX Winter 1993 Technical Conference, pages 249-258, San Diego, CA, 1993.]]Google ScholarGoogle Scholar
  24. 24 S. J. Lefller, M.K. McKusick, M.J. Karels, and J.S. Quarterman. The Design and Implementation of the 4.3BSD UNIX Operating System. Addison-Wesley, 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 Larry McVoy and Carl Staelin. LMBENCH: Portable tools for performanee analysis, in USENIX Technical Conference of UNIX and Advanced Computing Systems, San Diego, CA, January 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 Ron Minnich, Dan Bums, and Frank Hady. The memory-integrated network interface. IEEE Micro, 15( 1):11-20, February 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27 David Mosberger, Larry L. Peterson, Patrick G. Bridges, and Scan O'- Malley. Analysis of techniques to improve protocol processing latency. In A CM SIGCOMM Symposium on Communications Architectures and Protocols, Stanford, CA, August 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28 B.J. Murphy, S. Zeadally, and C.J. Adams. An analysis of process and memory models to support high-speed networking in a UNIX environment. In Proceedings oj the Winter USENIX Technical Conference, San Diego, CA, January 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. 29 Erich M. Nahum. Validating an architectural simulator. Technical Report 96-40, Department of Computer Science, University of Massachusetts at Amherst, September 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30 Erich M Nahum, David J. Yates, James E Kurose. and Don Towsley. Performance issues in parallelized network protocols. In First USENIX Symposium on Operating Systems Design and Implementation, pages 125-137, Monterey, CA, November 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. 31 Karl Pettis and Robert C. Hansen. Profile guided code positioning. In ACM SIGPLAN '90 Conference on Programming Language Design and Implementation (PLDI), pages 16-27, White Plains, NY, June 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. 32 Mendel Rosenblum, Edouard Bugnion, Stephen A. Herrod, Emmett Witchell, and Anoop Gupta. The impact of computer architecture on operating system performance. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, Copper Canyon, CO, December 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. 33 James D. Salehi, James E Kurose, and Don Towsley. The effectiveness of affinity-based scheduling in multiprocessor network protocol processing. IEEE/ACM Transactions on Networking, 4(4):516-530, August 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. 34 Douglas C. Schmidt and Tatsuya Suda. Measuring the performance of parallel message-based process architectures. In Proceedings of'the Conference on Computer Communications (IEEE Infocom), Boston, MA, April 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. 35 Silicon Graphics Inc. Cord manual page, IRIX 5.3.]]Google ScholarGoogle Scholar
  36. 36 Michael D. Smith. Tracing with Pixie. Technical report, Center for Integrated Systems, Stanford University, Stanford, CA, April 1991.]]Google ScholarGoogle Scholar
  37. 37 Steven E. Speer, Rajiv Kumar, and Craig Partridge. Improving UNIX kernel performance using profile based optimization. In Proceedings of the Winter 1994 USENIX Conference, pages 181-188, San Francisco, CA, January 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. 38 Jack E. Veenstra and Robert J. Fowler. MINT: A front end for efficient simulation of shared-memory mulUprocessors. In Proceedings 2nd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Durham, NC, January 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. 39 David J. Yates, Erich M. Nahum, James E Kurose, and Don Towsley. Networking support for large scale multiprocessor servers. In Proceedings of the A CM Sigmetrics Conference on Measurement and Modeling of Computer Systems, Philadelphia, Pennsylvania, May 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache behavior of network protocols

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM SIGMETRICS Performance Evaluation Review
                ACM SIGMETRICS Performance Evaluation Review  Volume 25, Issue 1
                June 1997
                298 pages
                ISSN:0163-5999
                DOI:10.1145/258623
                Issue’s Table of Contents
                • cover image ACM Conferences
                  SIGMETRICS '97: Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
                  June 1997
                  302 pages
                  ISBN:0897919092
                  DOI:10.1145/258612

                Copyright © 1997 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 June 1997

                Check for updates

                Qualifiers

                • article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader