skip to main content
10.1145/3131365.3131375acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article
Public Access

High-resolution measurement of data center microbursts

Published: 01 November 2017 Publication History

Abstract

Data centers house some of the largest, fastest networks in the world. In contrast to and as a result of their speed, these networks operate on very small timescales---a 100 Gbps port processes a single packet in at most 500 ns with end-to-end network latencies of under a millisecond. In this study, we explore the fine-grained behaviors of a large production data center using extremely high-resolution measurements (10s to 100s of microsecond) of rack-level traffic. Our results show that characterizing network events like congestion and synchronized behavior in data centers does indeed require the use of such measurements. In fact, we observe that more than 70% of bursts on the racks we measured are sustained for at most tens of microseconds: a range that is orders of magnitude higher-resolution than most deployed measurement frameworks. Congestion events observed by less granular measurements are likely collections of smaller μbursts. Thus, we find that traffic at the edge is significantly less balanced than other metrics might suggest. Beyond the implications for measurement granularity, we hope these results will inform future data center load balancing and congestion control protocols.

References

[1]
Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. 2014. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 503--514.
[2]
Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DcTCP). In Proceedings of the ACM SIGCOMM 2010 Conference on Data Communication (SIGCOMM '10). ACM, New York, NY, USA, 63--74.
[3]
Mohammad Alizadeh, Abdul Kabbani, Tom Edsall, Balaji Prabhakar, Amin Vahdat, and Masato Yasuda. 2012. Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX, San Jose, CA, 253--266.
[4]
Alexey Andreyev. 2014. Introducing data center fabric, the next-generation Facebook data center network. https://code.facebook.com. (Nov. 2014).
[5]
Guido Appenzeller, Isaac Keslassy, and Nick McKeown. 2004. Sizing Router Buffers. In Proceedings of the 2004 ACM SIGCOMM Conference on Data Communication (SIGCOMM '04). ACM, New York, NY, USA, 281--292.
[6]
Theophilus Benson, Aditya Akella, and David A. Maltz. 2010. Network Traffic Characteristics of Data Centers in the Wild. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (IMC '10). ACM, New York, NY, USA, 267--280.
[7]
J. Case, Mundy R., Partain D., and Stewart B. 2002. Introduction and Applicability Statements for Internet Standard Management Framework. (2002). https://tools.ietf.org/html/rfc3410.
[8]
Daniel Halperin, Srikanth Kandula, Jitendra Padhye, Paramvir Bahl, and David Wetherall. 2011. Augmenting Data Center Networks with Multi-gigabit Wireless Links. In Proceedings of the ACM SIGCOMM 2011 Conference on Data Communication (SIGCOMM '11). ACM, New York, NY, USA, 38--49.
[9]
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. 2009. The Nature of Data Center Traffic: Measurements & Analysis. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC '09). ACM, New York, NY, USA, 202--208.
[10]
Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, and Lawrence J Wobker. 2015. In-band network telemetry via programmable dataplanes. SIGCOMM Demo (2015).
[11]
Vincent Liu, Daniel Halperin, Arvind Krishnamurthy, and Thomas Anderson. 2013. F10: A Fault-tolerant Engineered Network. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI'13). USENIX Association, Berkeley, CA, USA, 399--412. http://dl.acm.org/citation.cfm?id=2482626.2482665
[12]
Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, Vyas Sekar, and Vladimir Braverman. 2016. One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon. In Proceedings of the 2016 ACM SIGCOMM Conference on Data Communication (SIGCOMM '16). ACM, New York, NY, USA, 101--114.
[13]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based Congestion Control for the Datacenter. In Proceedings of the 2015 ACM Conference on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 537--550.
[14]
Masoud Moshref, Minlan Yu, Ramesh Govindan, and Amin Vahdat. 2015. SCREAM: Sketch Resource Allocation for Software-defined Measurement. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT '15). ACM, New York, NY, USA, 14:1--14:13.
[15]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI'13). USENIX Association, Berkeley, CA, USA, 385--398. http://dl.acm.org/citation.cfm?id=2482626.2482663
[16]
P. Phaal, S. Panchen, and N. McKee. 2001. InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks. RFC 3176 (Informational). (2001).
[17]
Jeff Rasley, Brent Stephens, Colin Dixon, Eric Rozner, Wes Felter, Kanak Agarwal, John Carter, and Rodrigo Fonseca. 2014. Planck: Millisecond-scale Monitoring and Control for Commodity Networks. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 407--418.
[18]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the Social Network's (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 123--137.
[19]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In Proceedings of the 2015 ACM Conference on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 183--197.
[20]
Minlan Yu, Lavanya Jose, and Rui Miao. 2013. Software Defined Traffic Measurement with OpenSketch. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI'13). USENIX Association, Berkeley, CA, USA, 29--42. http://dl.acm.org/citation.cfm?id=2482626.2482631

Cited By

View all
  • (2024)CARAVANProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691956(325-345)Online publication date: 10-Jul-2024
  • (2024)Reasoning about network traffic load property at production scaleProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691884(1063-1081)Online publication date: 16-Apr-2024
  • (2024)Precise data center traffic engineering with constrained hardware resourcesProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691862(669-690)Online publication date: 16-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IMC '17: Proceedings of the 2017 Internet Measurement Conference
November 2017
509 pages
ISBN:9781450351188
DOI:10.1145/3131365
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data center traffic
  2. microbursts

Qualifiers

  • Research-article

Funding Sources

Conference

IMC '17
IMC '17: Internet Measurement Conference
November 1 - 3, 2017
London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 277 of 1,083 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)238
  • Downloads (Last 6 weeks)22
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CARAVANProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691956(325-345)Online publication date: 10-Jul-2024
  • (2024)Reasoning about network traffic load property at production scaleProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691884(1063-1081)Online publication date: 16-Apr-2024
  • (2024)Precise data center traffic engineering with constrained hardware resourcesProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691862(669-690)Online publication date: 16-Apr-2024
  • (2024)CredenceProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691859(613-634)Online publication date: 16-Apr-2024
  • (2024)A large-scale deployment of DCTCPProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691839(239-252)Online publication date: 16-Apr-2024
  • (2024)Opportunistic Packet Forwarding for Proactive Transport in Datacenters2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619903(1-9)Online publication date: 3-Jun-2024
  • (2024)Uncovering Secrets of Microbursts in Datacenter Network Traffic2024 20th International Conference on Network and Service Management (CNSM)10.23919/CNSM62983.2024.10814641(1-5)Online publication date: 28-Oct-2024
  • (2024)Caravan: Practical Online Learning of In-Network ML Models with Labeling AgentsProceedings of the 3rd Workshop on Practical Adoption Challenges of ML for Systems10.1145/3704742.3704964(17-20)Online publication date: 4-Nov-2024
  • (2024)NetGSR: Towards Efficient and Reliable Network Monitoring with Generative Super ResolutionProceedings of the ACM on Networking10.1145/36964002:CoNEXT4(1-27)Online publication date: 25-Nov-2024
  • (2024)F3: Fast and Flexible Network Telemetry with an FPGA coprocessorProceedings of the ACM on Networking10.1145/36963972:CoNEXT4(1-22)Online publication date: 25-Nov-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media