Abstract
Today’s Web provides many different functionalities, including communication, entertainment, social networking, and information retrieval. In this article, we analyze traces of HTTP activity from a large enterprise and from a large university to identify and characterize Web-based service usage. Our work provides an initial methodology for the analysis of Web-based services. While it is nontrivial to identify the classes, instances, and providers for each transaction, our results show that most of the traffic comes from a small subset of providers, which can be classified manually. Furthermore, we assess both qualitatively and quantitatively how the Web has evolved over the past decade, and discuss the implications of these changes.
- Adamic, L. 2009. Zipf, power-laws, and Pareto - a ranking tutorial. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html.Google Scholar
- Adamic, L. and Huberman, B. 2002. Zipf’s law and the Internet. Glottometrics 3, 143--150.Google Scholar
- Arlitt, M., Friedrich, R., and Jin, T. 1999. Workload characterization of a Web proxy in a cable modem environment. ACM SIGMETRICS Perf. Eval. Rev. 27, 2 (Sept.), 25--36. Google ScholarDigital Library
- Atkinson, K. 2008. Kevin’s word list page (12 dicts package). http://wordlist.sourceforge.net/.Google Scholar
- Baeza-Yates, R., Castillo, C., and Efthimiadis, E. 2007. Characterization of national Web domains. ACM Trans. Internet Tech. 7, 2. Google ScholarDigital Library
- Bent, L., Rabinovich, M., Voelker, G., and Xiao, Z. 2006. Characterization of a large Web site population with implications for content delivery. WWW J. 9, 4, 505--536. Google ScholarDigital Library
- Berners-Lee, T., Cailliau, R., Luotonen, A., Frystyk-Nielsen, H., and Secret, A. 1994. The world wide Web. Comm. ACM 37, 8, 76--82. Google ScholarDigital Library
- Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. 1999. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of the IEEE INFOCOM.Google Scholar
- Bro Intrusion Detection System. 2008. http://www.bro-ids.org/.Google Scholar
- Clauset, A., Shalizi, C., and Newman, M. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google ScholarDigital Library
- Cormode, G. and Krishnamurthy, B. 2008. Key differences between Web 1.0 and Web 2.0. First Monday.Google Scholar
- Crovella, M. and Bestavros, A. 1997. Self-similarity in world wide Web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5, 6, 835--846. Google ScholarDigital Library
- Cunha, C., Bestavros, A., and Crovella, M. 1995. Characteristics of world wide Web client-based traces. Tech. rep. BUCS-TR-1995-010, Computer Science Department, Boston University, Boston, MA. Google ScholarDigital Library
- Duska, B., Marwood, D., and Freeley, M. 1997. The measured access characteristics of world wide Web client proxy caches. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS). Google ScholarDigital Library
- Fetterly, D., Manasse, M., Najork, M., and Wiener, J. 2003. A large-scale study of the evolution of Web pages. In Proceedings of the 11th International Conference on World Wide Web (WWW). ACM. Google ScholarDigital Library
- Glassman, S. 1994. A caching relay for the world wide Web. Comput. Netw. ISDN Syst. 27, 2, 69--76. Google ScholarDigital Library
- Google Apps Education Edition. 2009. http://www.google.com/educators/p_apps.html.Google Scholar
- Han, E. and Karypis, G. 2000. Centroid-based document classification: Analysis and experimental results. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). Google ScholarDigital Library
- Kelly, T. and Mogul, J. 2002. Aliasing on the World Wide Web: Prevalence and performance implications. In Proceedings of the 11th International Conference on World Wide Web (WWW). ACM. Google ScholarDigital Library
- Krishnamurthy, B. and Wills, C. 2006a. Cat and mouse: Content delivery tradeoffs in Web access. In Proceedings of the 15th International Conference on World Wide Web (WWW). ACM. Google ScholarDigital Library
- Krishnamurthy, B. and Wills, C. 2006b. Generating a privacy footprint on the Internet. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC). Google ScholarDigital Library
- Krishnamurthy, B. and Wills, C. 2009. Privacy diffusion on the Web: A longitudinal perspective. In Proceedings of the 18th International Conference on World Wide Web (WWW). ACM. Google ScholarDigital Library
- Kwan, O. and Lee, J. 2003. Text categorization based on k-nearest neighbors approach for Web site classification. Inf. Proc. Manage. 39, 1, 25--44. Google ScholarDigital Library
- Li, W., Moore, A., and Canini, M. 2008. Classifying http traffic in the new age. In Proceedings of ACM SIGCOMM Conference (Poster).Google Scholar
- Ma, J., Levchenko, K., Kreibich, C., Savage, S., and Voelker, G. 2006. Unexpected means of protocol inference. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC). Google ScholarDigital Library
- Mahanti, A., Williamson, C., and Eager, D. 2000. Traffic analysis of a Web proxy caching hierarchy. IEEE Netw. 14, 3, 16--23. Google ScholarDigital Library
- Manning, C., Raghavan, P., and Schütze, H. 2009. An Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.Google Scholar
- Newman, M. 2005. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 5, 323--351.Google ScholarCross Ref
- Qi, X. and Davison, B. 2007. Web page classification: Features and algorithms. Tech. rep. LU-CSE-07-010, Lehigh University.Google Scholar
- Schneider, F., Agarwal, S., Alpcan, T., and Feldmann, A. 2008. The new Web: Characterizing Ajax traffic. In Proceedings of the Conference on Passive and Active Network Measurement (PAM). Google ScholarDigital Library
- Trestian, I., Ranjan, S., Kuzmanovic, A., and Nucci, A. 2008. Unconstrained endpoint profiling (googling the Internet). In Proceedings of ACM SIGCOMM. Google ScholarDigital Library
- Wikipedia Article. 2009. Domain hack. http://en.wikipedia.org/wiki/Domain_hack.Google Scholar
- Williams, A., Arlitt, M., Williamson, C., and Barker, K. 2005. Web workload characterization: Ten years later. Web Content Delivery, 3--21.Google Scholar
- Wolman, A., Voelker, G., Sharma, N., Cardwell, N., Karlin, A., and Levy, H. 1999. On the scale and performance of cooperative Web proxy caching. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). Google ScholarDigital Library
Index Terms
- Characterizing Organizational Use of Web-Based Services: Methodology, Challenges, Observations, and Insights
Recommendations
Requirements for QoS-Based Web Service Description and Discovery
The goal of Service Oriented Architectures (SOAs) is to enable the creation of business applications through the automatic discovery and composition of independently developed and deployed (Web) services. Automatic discovery of Web Services (WSs) can be ...
Analysis of web-usage behavior for focused web sites: a case study
Special issue: Web site evolutionThe number of Web users and the diversity of their interests increase continuously; Web-content providers seek to infer these interests and to adapt their Web sites to improve accessibility of the offered content. Usage-pattern mining is a promising ...
Composing Web Services: A QoS View
An Internet application can invoke several services--a stock-trading Web service, for example, could invoke a payment service, which could then invoke an authentication service. Such a scenario is called a composite Web service, and it can be specified ...
Comments