ABSTRACT
Although traffic between Web servers and Web browsers is readily apparent to many knowledgeable end users, fewer are aware of the extent of server-to-server Web traffic carried over the public Internet. We refer to the former class of traffic as front-office Internet Web traffic and the latter as back-office Internet Web traffic (or just front-office and back-office traffic, for short). Back-office traffic, which may or may not be triggered by end-user activity, is essential for today's Web as it supports a number of popular but complex Web services including large-scale content delivery, social networking, indexing, searching, advertising, and proxy services. This paper takes a first look at back-office traffic, measuring it from various vantage points, including from within ISPs, IXPs, and CDNs. We describe techniques for identifying back-office traffic based on the roles that this traffic plays in the Web ecosystem. Our measurements show that back-office traffic accounts for a significant fraction not only of core Internet traffic, but also of Web transactions in the terms of requests and responses. Finally, we discuss the implications and opportunities that the presence of back-office traffic presents for the evolution of the Internet ecosystem.
- Google AdExchange. http://developers.google.com/ad-exchange/rtb/getting_started.Google Scholar
- Internet Advertising Bureau (IAB). 2013 Internet Advertising Revenue Report. http://www.iab.net/AdRevenueReport.Google Scholar
- Netflix Open Connect. https://signup.netflix.com/openconnect.Google Scholar
- Network Functions Virtualisation. SDN and OpenFlow World Congress, 2012.Google Scholar
- V. K. Adhikari, S. Jain, Y. Chen, and Z. L. Zhang. Vivisecting YouTube: An Active Measurement Study. In IEEE INFOCOM, 2012.Google ScholarCross Ref
- B. Ager, N. Chatzis, A. Feldmann, N. Sarrar, S. Uhlig, and W. Willinger. Anatomy of a Large European IXP. In ACM SIGCOMM, 2012. Google ScholarDigital Library
- B. Ager, W. Mühlbauer, G. Smaragdakis, and S. Uhlig. Web Content Cartography. In ACM IMC, 2011. Google ScholarDigital Library
- B. Ager, F. Schneider, J. Kim, and A. Feldmann. Revisiting Cacheability in Times of User Generated Content. In IEEE GI, 2010.Google ScholarCross Ref
- S. Angel and M. Walfish. Verifiable auctions for online ad exchanges. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- P. Barford, I. Canadi, D. Krushevskaja, Q. Ma, and S. Muthukrishnan. Adscape: Harvesting and Analyzing Online Display Ads. In WWW, 2014. Google ScholarDigital Library
- L. A. Barroso, J. Dean, and U. Holzle. Web Search for a Planet: The Google Clustering Architecture. IEEE Micro, 23, 2003. Google ScholarDigital Library
- T. Benson, A. Akella, and D. A. Maltz. Network traffic characteristics of data centers in the wild. In ACM IMC, 2010. Google ScholarDigital Library
- T. Benson, A. Anand, A. Akella, and M. Zhang. MicroTE: Fine Grained Traffic Engineering for Data Centers. In CoNEXT, 2011. Google ScholarDigital Library
- I. Bermudez, M. Mellia, M. Munafà, R. Keralapura, and A. Nucci. DNS to the Rescue: Discerning Content and Services in a Tangled Web. In ACM IMC, 2012. Google ScholarDigital Library
- L. Bernaille and R. Teixeira. Early recognition of encrypted applications. In PAM, 2007. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In WWW, 1998. Google ScholarDigital Library
- M. Butkiewicz, H. V. Madhyastha, and V. Sekar. Characterizing Web Page Complexity and Its Impact. IEEE/ACM Trans. Networking, 22(3), 2014. Google ScholarDigital Library
- M. Calder, X. Fan, Z. Hu, E. Katz-Bassett, J. Heidemann, and R. Govindan. Mapping the Expansion of Google's Serving Infrastructure. In ACM IMC, 2013. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. 2006.Google Scholar
- N. Chatzis, G. Smaragdakis, A. Feldmann, and W. Willinger. There is More to IXPs than Meets the Eye. ACM CCR, 43(5), 2013. Google ScholarDigital Library
- Y. Chen, S. Jain, V. K. Adhikari, and Z. L. Zhang. Characterizing Roles of Front-End Servers in End-to-End Performance of Dynamic Content Distribution. In ACM IMC, 2011. Google ScholarDigital Library
- Y. Chen, R. Mahajan, B. Sridharan, and Z. L. Zhang. A Provider-side View of Web Search Response Time. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- F. Dobrian, A. Awan, D. Joseph, A. Ganjam, J. Zhan, V. Sekar, I. Stoica, and H. Zhang. Understanding the Impact of Video Quality on User Engagement. In ACM SIGCOMM, 2011. Google ScholarDigital Library
- I Drago, M. Mellia an M. Munafo, A. Sperotto, R. Sadre, and A. Pras. Inside Dropbox: Understanding Personal Cloud Storage Services. In ACM IMC, 2012. Google ScholarDigital Library
- Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-Wide Scanning and its Security Applications. In USENIX Security Symposium, 2013. Google ScholarDigital Library
- J. Erman, A. Gerber, M. Hajiaghayi, D. Pei, and O. Spatscheck. Network-aware Forward Caching. In WWW, 2009. Google ScholarDigital Library
- A. Feldmann, N. Kammenhuber, O. Maennel, B. Maggs, R. De Prisco, and R. Sundaram. A methodology for estimating interdomain web traffic demand. In ACM IMC, 2004. Google ScholarDigital Library
- T. Flach, N. Dukkipati, A. Terzis, B. Raghavan, N. Cardwell, Y. Cheng, A. Jain, S. Hao, E. Katz-Bassett, and R. Govindan. Reducing Web Latency: the Virtue of Gentle Aggression. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- A. Ford, C. Raiciu, M. Handley, S. Barre, and J. Iyengar. Architectural guidelines for multipath TCP development. Internet Draft, rfc-6182.Google Scholar
- B. Frank, I. Poese, Y. Lin, G. Smaragdakis, A. Feldmann, B. Maggs, J. Rake, S. Uhlig, and R. Weber. Pushing CDN-ISP Collaboration to the Limit. ACM CCR, 43(3), 2013. Google ScholarDigital Library
- H. Gao, V. Yegneswaran, Y. Chen, P. Porras, S. Ghosh, J. Jiang, and H. Duan. An Empirical Reexamination of Global DNS Behavior. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- A. Gerber and R. Doverspike. Traffic Types and Growth in Backbone Networks. In OFC/NFOEC, 2011.Google ScholarCross Ref
- P. Gill, V. Erramilli, A. Chaintreau, B. Krishnamurthy, K. Papagiannaki, and P. Rodriguez. Follow the Money: Understanding Economics of Online Aggregation and Advertising. In ACM IMC, 2013. Google ScholarDigital Library
- A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In ACM SIGCOMM, 2009. Google ScholarDigital Library
- C. Huang, A. Wang, J. Li, and K. Ross. Measuring and Evaluating Large-Scale CDNs. In ACM IMC, 2008. Google Scholar
- S. Ihm and V. S. Pai. Towards Understanding Modern Web Traffic. In ACM IMC, 2011. Google ScholarDigital Library
- S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J. Zhou, M. Zhu, J. Zolla, U. Holzle, S. Stuart, and A. Vahdat. B4: Experience with a Globally-Deployed Software Defined WAN. In ACM SIGCOMM, 2013. Google ScholarDigital Library
- J. Kleinberg. Authoritative sources in a hyperlinked environment. In ACM/SIAM SODA, 1998. Google ScholarDigital Library
- R. Kohavi, R. M. Henne, and D. Sommerfield. Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO. In ACM KDD, 2007. Google ScholarDigital Library
- R. Krishnan, H. Madhyastha, S. Srinivasan, S. Jain, A. Krishnamurthy, T. Anderson, and J. Gao. Moving Beyond End-to-end Path Information to Optimize CDN Performance. In ACM IMC, 2009. Google ScholarDigital Library
- S. S. Krishnan and R. K. Sitaraman. Video Stream Quality Impacts Viewer Behavior: Inferring Causality using Quasi-Experimental Designs. In ACM IMC, 2012. Google ScholarDigital Library
- C. Labovitz, S. Lekel-Johnson, D. McPherson, J. Oberheide, and F. Jahanian. Internet Inter-Domain Traffic. In ACM SIGCOMM, 2010. Google ScholarDigital Library
- T. Leighton. Improving Performance on the Internet. Communications of the ACM, 52(2):44--51, 2009. Google ScholarDigital Library
- J. Liang, J. Jiang, H. Duan, K. Li, T. Wan, and J. Wu. When HTTPS Meets CDN: A Case of Authentication in Delegated Service. In IEEE Symp. on Security and Privacy, 2014. Google ScholarDigital Library
- G. Maier, A. Feldmann, V. Paxson, and M. Allman. On Dominant Characteristics of Residential Broadband Internet Traffic. In ACM IMC, 2009. Google ScholarDigital Library
- E. Nygren, R. K. Sitaraman, and J. Sun. The Akamai Network: A Platform for High-performance Internet Applications. SIGOPS Oper. Syst. Rev., 2010. Google ScholarDigital Library
- I. Poese, B. Frank, B. Ager, G. Smaragdakis, and A. Feldmann. Improving Content Delivery using Provider-aided Distance Information. In ACM IMC, 2010. Google ScholarDigital Library
- I. Poese, B. Frank, G. Smaragdakis, S. Uhlig, A. Feldmann, and B. Maggs. Enabling Content-aware Traffic Engineering. ACM CCR, 42(5), 2012. Google ScholarDigital Library
- L. Popa, A. Ghodsi, and I. Stoica. HTTP as the Narrow Waist of the Future Internet. In SIGCOMM HotNets, 2010. Google ScholarDigital Library
- F. Qian, A. Gerber, Z. M. Mao, S. Sen, O. Spatscheck, and W. Willinger. TCP Revisited: A Fresh Look at TCP in the Wild. In ACM IMC, 2009. Google ScholarDigital Library
- InMon -- sFlow. http://sflow.org/.Google Scholar
- M. Z. Shafiq, L. Ji, A. X. Liu, J. Pang, and J. Wang. A First Look at Cellular Machine-to-Machine Traffic -- Large Scale Measurement and Characterization. In ACM SIGMETRICS, 2012. Google ScholarDigital Library
- J. Sherry, S. Hasan, C. Scott, A. Krishnamurthy, S. Ratsanamy, and V. Sekar. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service. In SIGCOMM, 2012. Google ScholarDigital Library
- R. K. Sitaraman, M. Kasbekar, W. Lichtenstein, and M. Jain. Overlay Networks: An Akamai Perspective. John Wiley & Sons, 2014.Google ScholarCross Ref
- K. Springborn and P. Barford. Impression Fraud in Online Advertising via Pay-Per-View Networks. In USENIX Security Symposium, 2013. Google ScholarDigital Library
- F. Streibelt, J. Boettger, N. Chatzis, G. Smaragdakis, and A. Feldmann. Exploring EDNS-Client-Subnet Adopters in your Free Time. In ACM IMC, 2013. Google ScholarDigital Library
- S. Triukose, Z. Wen, and M. Rabinovich. Measuring a Commercial Content Delivery Network. In WWW, 2011. Google ScholarDigital Library
- N. Weaver, C. Kreibich, M. Dam, and V. Paxson. Here Be Web Proxies. In PAM, 2014.Google ScholarDigital Library
- S. Yuan, J. Wang, and X. Zhao. Real-time bidding for online advertising: measurement and analysis. In ADKDD, 2013. Google ScholarDigital Library
Index Terms
- Back-Office Web Traffic on The Internet
Recommendations
New MPLS network management techniques based on adaptive learning
The combined use of the differentiated services (DiffServ) and multiprotocol label switching (MPLS) technologies is envisioned to provide guaranteed quality of service (QoS) for multimedia traffic in IP networks, while effectively using network ...
An investigation of web crawler behavior: characterization and metrics
In this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different ...
Web Crawlers on a Health Related Portal: Detection, Characterisation and Implications
DESE '11: Proceedings of the 2011 Developments in E-systems EngineeringWeb crawlers are automated computer programs that visit websites in order to download their content. They are employed for non-malicious (search engine crawlers indexing websites) and malicious purposes (those breaching privacy by harvesting email ...
Comments