skip to main content
research-article

Characterizing Organizational Use of Web-Based Services: Methodology, Challenges, Observations, and Insights

Published:01 October 2011Publication History
Skip Abstract Section

Abstract

Today’s Web provides many different functionalities, including communication, entertainment, social networking, and information retrieval. In this article, we analyze traces of HTTP activity from a large enterprise and from a large university to identify and characterize Web-based service usage. Our work provides an initial methodology for the analysis of Web-based services. While it is nontrivial to identify the classes, instances, and providers for each transaction, our results show that most of the traffic comes from a small subset of providers, which can be classified manually. Furthermore, we assess both qualitatively and quantitatively how the Web has evolved over the past decade, and discuss the implications of these changes.

References

  1. Adamic, L. 2009. Zipf, power-laws, and Pareto - a ranking tutorial. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html.Google ScholarGoogle Scholar
  2. Adamic, L. and Huberman, B. 2002. Zipf’s law and the Internet. Glottometrics 3, 143--150.Google ScholarGoogle Scholar
  3. Arlitt, M., Friedrich, R., and Jin, T. 1999. Workload characterization of a Web proxy in a cable modem environment. ACM SIGMETRICS Perf. Eval. Rev. 27, 2 (Sept.), 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Atkinson, K. 2008. Kevin’s word list page (12 dicts package). http://wordlist.sourceforge.net/.Google ScholarGoogle Scholar
  5. Baeza-Yates, R., Castillo, C., and Efthimiadis, E. 2007. Characterization of national Web domains. ACM Trans. Internet Tech. 7, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bent, L., Rabinovich, M., Voelker, G., and Xiao, Z. 2006. Characterization of a large Web site population with implications for content delivery. WWW J. 9, 4, 505--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Berners-Lee, T., Cailliau, R., Luotonen, A., Frystyk-Nielsen, H., and Secret, A. 1994. The world wide Web. Comm. ACM 37, 8, 76--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. 1999. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of the IEEE INFOCOM.Google ScholarGoogle Scholar
  9. Bro Intrusion Detection System. 2008. http://www.bro-ids.org/.Google ScholarGoogle Scholar
  10. Clauset, A., Shalizi, C., and Newman, M. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cormode, G. and Krishnamurthy, B. 2008. Key differences between Web 1.0 and Web 2.0. First Monday.Google ScholarGoogle Scholar
  12. Crovella, M. and Bestavros, A. 1997. Self-similarity in world wide Web traffic: Evidence and possible causes. IEEE/ACM Trans. Netw. 5, 6, 835--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cunha, C., Bestavros, A., and Crovella, M. 1995. Characteristics of world wide Web client-based traces. Tech. rep. BUCS-TR-1995-010, Computer Science Department, Boston University, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Duska, B., Marwood, D., and Freeley, M. 1997. The measured access characteristics of world wide Web client proxy caches. In Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fetterly, D., Manasse, M., Najork, M., and Wiener, J. 2003. A large-scale study of the evolution of Web pages. In Proceedings of the 11th International Conference on World Wide Web (WWW). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Glassman, S. 1994. A caching relay for the world wide Web. Comput. Netw. ISDN Syst. 27, 2, 69--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Google Apps Education Edition. 2009. http://www.google.com/educators/p_apps.html.Google ScholarGoogle Scholar
  18. Han, E. and Karypis, G. 2000. Centroid-based document classification: Analysis and experimental results. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kelly, T. and Mogul, J. 2002. Aliasing on the World Wide Web: Prevalence and performance implications. In Proceedings of the 11th International Conference on World Wide Web (WWW). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Krishnamurthy, B. and Wills, C. 2006a. Cat and mouse: Content delivery tradeoffs in Web access. In Proceedings of the 15th International Conference on World Wide Web (WWW). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Krishnamurthy, B. and Wills, C. 2006b. Generating a privacy footprint on the Internet. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Krishnamurthy, B. and Wills, C. 2009. Privacy diffusion on the Web: A longitudinal perspective. In Proceedings of the 18th International Conference on World Wide Web (WWW). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kwan, O. and Lee, J. 2003. Text categorization based on k-nearest neighbors approach for Web site classification. Inf. Proc. Manage. 39, 1, 25--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Li, W., Moore, A., and Canini, M. 2008. Classifying http traffic in the new age. In Proceedings of ACM SIGCOMM Conference (Poster).Google ScholarGoogle Scholar
  25. Ma, J., Levchenko, K., Kreibich, C., Savage, S., and Voelker, G. 2006. Unexpected means of protocol inference. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mahanti, A., Williamson, C., and Eager, D. 2000. Traffic analysis of a Web proxy caching hierarchy. IEEE Netw. 14, 3, 16--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Manning, C., Raghavan, P., and Schütze, H. 2009. An Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  28. Newman, M. 2005. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 5, 323--351.Google ScholarGoogle ScholarCross RefCross Ref
  29. Qi, X. and Davison, B. 2007. Web page classification: Features and algorithms. Tech. rep. LU-CSE-07-010, Lehigh University.Google ScholarGoogle Scholar
  30. Schneider, F., Agarwal, S., Alpcan, T., and Feldmann, A. 2008. The new Web: Characterizing Ajax traffic. In Proceedings of the Conference on Passive and Active Network Measurement (PAM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Trestian, I., Ranjan, S., Kuzmanovic, A., and Nucci, A. 2008. Unconstrained endpoint profiling (googling the Internet). In Proceedings of ACM SIGCOMM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wikipedia Article. 2009. Domain hack. http://en.wikipedia.org/wiki/Domain_hack.Google ScholarGoogle Scholar
  33. Williams, A., Arlitt, M., Williamson, C., and Barker, K. 2005. Web workload characterization: Ten years later. Web Content Delivery, 3--21.Google ScholarGoogle Scholar
  34. Wolman, A., Voelker, G., Sharma, N., Cardwell, N., Karlin, A., and Levy, H. 1999. On the scale and performance of cooperative Web proxy caching. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characterizing Organizational Use of Web-Based Services: Methodology, Challenges, Observations, and Insights

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 5, Issue 4
        October 2011
        154 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/2019643
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2011
        • Accepted: 1 February 2011
        • Revised: 1 June 2010
        • Received: 1 August 2009
        Published in tweb Volume 5, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader