ABSTRACT
This paper presents a systematic study of the properties of a large number of Web sites hosted by a major ISP. To our knowledge, ours is the first comprehensive study of a large server farm that contains thousands of commercial Web sites. We also perform a simulation analysis to estimate potential performance benefits of content delivery networks (CDNs) for these Web sites. We make several interesting observations about the current usage of Web technologies and Web site performance characteristics. First, compared with previous client workload studies, the Web server farm workload contains a much higher degree of uncacheable responses and responses that require mandatory cache validations. A significant reason for this is that cookie use is prevalent among our population, especially among more popular sites. However, we found an indication of wide-spread indiscriminate usage of cookies, which unnecessarily impedes the use of many content delivery optimizations. We also found that most Web sites do not utilize the cache-control features ofthe HTTP 1.1 protocol, resulting in suboptimal performance. Moreover, the implicit expiration time in client caches for responses is constrained by the maximum values allowed in the Squid proxy. Finally, our simulation results indicate that most Web sites benefit from the use of a CDN. The amount of the benefit depends on site popularity, and, somewhat surprisingly, a CDN may increase the peak to average request ratio at the origin server because the CDN can decrease the average request rate more than the peak request rate.
- The squid Web proxy cache. version 2.5. http://www.squid-cache.org.Google Scholar
- M. Arlitt, R. Friedrich, and T. Jin. Workload characterization of a Web proxy in a cable modem environment. Technical Report HPL-1999-48, Hewlett Packard Labs, Apr. 1999. Google ScholarDigital Library
- M. Arlitt and T. Jin. Workload characterization of the 1998 World Cup Web site. Technical Report HPL-1999-35R1, HP Labs, Oct. 1999.Google Scholar
- M. F. Arlitt and C. L. Williamson. Web server workload characterization: The search for invariants. In Proc. of ACM SIGMETRICS, pages 126--137, 1996. Google ScholarDigital Library
- P. Barford, A. Bestavros, A. Bradley, and M. Crovella. Changes in Web client access patterns: characteristics and caching implications. World Wide Web, 2:15--28, 1999. Google ScholarDigital Library
- B. E. Brewington and G. Cybenko. How dynamic is the Web? In Proc. of the 9th Int. World Wide Web Conference, 2000. Google ScholarDigital Library
- L. Cherkasova and M. Karlsson. Dynamics and evolution of Web sites: Analysis, metrics and design issues. Technical Report HPL-2001-1R1, Hewlett Packard Laboratories, July 16 2001.Google ScholarCross Ref
- C. Cranor, T. Johnson, and O. Spatscheck. Gigascope: a stream database for network applications. In Proc. of ACM SIGMOD, June 2003. Google ScholarDigital Library
- F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: A live study of the World Wide Web. In Proc. of the USENIX Symp. on Internet Technologies and Systems, pages 147--158, Dec. 1997. Google ScholarDigital Library
- B. Duska, D. Marwood, and M. J. Feeley. The measured access characteristics of World Wide Web client proxy caches. In Proc. of the First USENIX Symp. on Internet Technologies and Systems, pages 23--36, Dec. 1997. Google ScholarDigital Library
- A. Feldmann, R. Caceres, F. Douglis, G. Glass, and M. Rabinovich. Performance of Web proxy caching in heterogeneous bandwidth environments. In Proc. of IEEE INFOCOM, pages 107--116, 1999. Google ScholarDigital Library
- S. D. Gribble and E. A. Brewer. System design issues for Internet middleware services: Deductions from a large client trace. In Proc. of the First USENIX Symp. on Internet Technologies and Systems, pages 207--218, Dec. 1997. Google ScholarDigital Library
- A. K. Iyengar, M. S. Squillante, and L. Zhang. Analysis and characterization of large-scale Web server access patterns and performance. World Wide Web, 2(1-2):85--100, June 1999. Google ScholarDigital Library
- Y. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: Characterization and implications for CDNs and web sites. In Proc. of the 11th Int. World Wide Web Conference, May 2002. Google ScholarDigital Library
- T. Kelly. Thin-client Web access patterns: measurements from a cache-busting proxy. In Proc. of the Int. Workshop on Web Content Caching and Distribution, 2001.Google Scholar
- B. Krishnamurthy and M. Arlitt. PRO-COW: Protocol compliance on the Web: A longitudinal study. In Proc. of the 3rd USENIX Symp. on Internet Technologies and Systems, pages 109--122, 2001. Google ScholarDigital Library
- B. Krishnamurthy and J. Wang. On network-aware clustering of Web clients. In Proc. of ACM SIGCOMM, Aug. 2000. Google ScholarDigital Library
- B. Krishnamurthy, C. Wills, and Y. Zhang. On the use and performance of content distribution networks. In Proc. of the First ACM SIGCOMM Internet Measurement Workshop, pages 169--182, Nov. 2001. Google ScholarDigital Library
- B. Krishnamurthy and C. E. Wills. Analyzing factors that influence end-to-end Web performance. Computer Networks, 33(1--6):17--32, 2000. Google ScholarDigital Library
- S. Manley and M. Seltzer. Web facts and fantasy. In Proc. of the USENIX Symp. on Internet Technologies and Systems, pages 125--133, Dec. 1997. Google ScholarDigital Library
- J. C. Mogul. Network behavior of a busy Web server and its clients. Technical Report 95/5, Compaq Western Research Lab, Oct. 1995.Google Scholar
- J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. In Proc. of ACM SIGCOMM, pages 181--194, 1997. Google ScholarDigital Library
- V. N. Padmanabhan and L. Qiu. The content and access dynamics of a busy Web site: Findings and implications. In Proc. of ACM SIGCOMM, Aug. 2000. Google ScholarDigital Library
- J. E. Pitkow. Summary of WWWcharacterizations. World Wide Web, 2:3--13, June 1999. Google ScholarDigital Library
- M. S. Raunak, P. J. Shenoy, P. Goyal, and K. Ramamritham. Implications of proxy caching for provisioning networks and servers. In Proc. of ACM SIGMETRICS, pages 66--77, 2000. Google ScholarDigital Library
- C. E. Wills and M. Mikhailov. Examining the cacheability of user-requested Web resources. In Proc. of the Fourth Int. Workshop on Web Content Caching and Distribution, Apr. 1999.Google Scholar
- A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown, T. Landray, D. Pinnel, A. Karlin, and H. Levy. Organization-based analysis of Web-object sharing and caching. In Proc. of the USENIX Symp. on Internet Technologies and Systems, 1999. Google ScholarDigital Library
- A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. M. Levy. On the scale and performance of cooperative Web proxy caching. In Proc. of ACM SOSP, pages 16--31, Dec. 1999. Google ScholarDigital Library
Index Terms
- Characterization of a large web site population with implications for content delivery
Recommendations
Characterization of a Large Web Site Population with Implications for Content Delivery
This paper presents a systematic study of the properties of a large number of Web sites hosted by a major ISP. To our knowledge, ours is the first comprehensive study of a large server farm that contains thousands of commercial Web sites. We also ...
Objective-Optimal Algorithms for Long-Term Web Prefetching
Web prefetching is based on Web caching and attempts to reduce user-perceived latency. Unlike on-demand caching, Web prefetching fetches objects and stores them in advance, hoping that the prefetched objects are likely to be accessed in the near future ...
Content Delivery Policies in Replicated Web Services: Client-Side vs. Server-Side
Replication of Web Services has an important role among techniques that have been developed in order to meet the demand for faster and more efficient access to the Internet. Replication can be addressed both by a cluster of servers, and by servers ...
Comments