skip to main content
10.1145/988672.988743acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Characterization of a large web site population with implications for content delivery

Published:17 May 2004Publication History

ABSTRACT

This paper presents a systematic study of the properties of a large number of Web sites hosted by a major ISP. To our knowledge, ours is the first comprehensive study of a large server farm that contains thousands of commercial Web sites. We also perform a simulation analysis to estimate potential performance benefits of content delivery networks (CDNs) for these Web sites. We make several interesting observations about the current usage of Web technologies and Web site performance characteristics. First, compared with previous client workload studies, the Web server farm workload contains a much higher degree of uncacheable responses and responses that require mandatory cache validations. A significant reason for this is that cookie use is prevalent among our population, especially among more popular sites. However, we found an indication of wide-spread indiscriminate usage of cookies, which unnecessarily impedes the use of many content delivery optimizations. We also found that most Web sites do not utilize the cache-control features ofthe HTTP 1.1 protocol, resulting in suboptimal performance. Moreover, the implicit expiration time in client caches for responses is constrained by the maximum values allowed in the Squid proxy. Finally, our simulation results indicate that most Web sites benefit from the use of a CDN. The amount of the benefit depends on site popularity, and, somewhat surprisingly, a CDN may increase the peak to average request ratio at the origin server because the CDN can decrease the average request rate more than the peak request rate.

References

  1. The squid Web proxy cache. version 2.5. http://www.squid-cache.org.Google ScholarGoogle Scholar
  2. M. Arlitt, R. Friedrich, and T. Jin. Workload characterization of a Web proxy in a cable modem environment. Technical Report HPL-1999-48, Hewlett Packard Labs, Apr. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Arlitt and T. Jin. Workload characterization of the 1998 World Cup Web site. Technical Report HPL-1999-35R1, HP Labs, Oct. 1999.Google ScholarGoogle Scholar
  4. M. F. Arlitt and C. L. Williamson. Web server workload characterization: The search for invariants. In Proc. of ACM SIGMETRICS, pages 126--137, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Barford, A. Bestavros, A. Bradley, and M. Crovella. Changes in Web client access patterns: characteristics and caching implications. World Wide Web, 2:15--28, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. E. Brewington and G. Cybenko. How dynamic is the Web? In Proc. of the 9th Int. World Wide Web Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Cherkasova and M. Karlsson. Dynamics and evolution of Web sites: Analysis, metrics and design issues. Technical Report HPL-2001-1R1, Hewlett Packard Laboratories, July 16 2001.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Cranor, T. Johnson, and O. Spatscheck. Gigascope: a stream database for network applications. In Proc. of ACM SIGMOD, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: A live study of the World Wide Web. In Proc. of the USENIX Symp. on Internet Technologies and Systems, pages 147--158, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Duska, D. Marwood, and M. J. Feeley. The measured access characteristics of World Wide Web client proxy caches. In Proc. of the First USENIX Symp. on Internet Technologies and Systems, pages 23--36, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of Web proxy caching in heterogeneous bandwidth environments. In Proc. of IEEE INFOCOM, pages 107--116, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. D. Gribble and E. A. Brewer. System design issues for Internet middleware services: Deductions from a large client trace. In Proc. of the First USENIX Symp. on Internet Technologies and Systems, pages 207--218, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. K. Iyengar, M. S. Squillante, and L. Zhang. Analysis and characterization of large-scale Web server access patterns and performance. World Wide Web, 2(1-2):85--100, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sites. In Proc. of the 11th Int. World Wide Web Conference, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Kelly. Thin-client Web access patterns: measurements from a cache-busting proxy. In Proc. of the Int. Workshop on Web Content Caching and Distribution, 2001.Google ScholarGoogle Scholar
  16. B. Krishnamurthy and M. Arlitt. PRO-COW: Protocol compliance on the Web: A longitudinal study. In Proc. of the 3rd USENIX Symp. on Internet Technologies and Systems, pages 109--122, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Krishnamurthy and J. Wang. On network-aware clustering of Web clients. In Proc. of ACM SIGCOMM, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Krishnamurthy, C. Wills, and Y. Zhang. On the use and performance of content distribution networks. In Proc. of the First ACM SIGCOMM Internet Measurement Workshop, pages 169--182, Nov. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Krishnamurthy and C. E. Wills. Analyzing factors that influence end-to-end Web performance. Computer Networks, 33(1--6):17--32, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Manley and M. Seltzer. Web facts and fantasy. In Proc. of the USENIX Symp. on Internet Technologies and Systems, pages 125--133, Dec. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. C. Mogul. Network behavior of a busy Web server and its clients. Technical Report 95/5, Compaq Western Research Lab, Oct. 1995.Google ScholarGoogle Scholar
  22. J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. In Proc. of ACM SIGCOMM, pages 181--194, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. N. Padmanabhan and L. Qiu. The content and access dynamics of a busy Web site: Findings and implications. In Proc. of ACM SIGCOMM, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. E. Pitkow. Summary of WWWcharacterizations. World Wide Web, 2:3--13, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. S. Raunak, P. J. Shenoy, P. Goyal, and K. Ramamritham. Implications of proxy caching for provisioning networks and servers. In Proc. of ACM SIGMETRICS, pages 66--77, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. E. Wills and M. Mikhailov. Examining the cacheability of user-requested Web resources. In Proc. of the Fourth Int. Workshop on Web Content Caching and Distribution, Apr. 1999.Google ScholarGoogle Scholar
  27. A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, and H. Levy. Organization-based analysis of Web-object sharing and caching. In Proc. of the USENIX Symp. on Internet Technologies and Systems, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M. Levy. On the scale and performance of cooperative Web proxy caching. In Proc. of ACM SOSP, pages 16--31, Dec. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterization of a large web site population with implications for content delivery

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WWW '04: Proceedings of the 13th international conference on World Wide Web
        May 2004
        754 pages
        ISBN:158113844X
        DOI:10.1145/988672

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 May 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

        Upcoming Conference

        WWW '24
        The ACM Web Conference 2024
        May 13 - 17, 2024
        Singapore , Singapore

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader