skip to main content
10.1145/3132747.3132759acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

CrystalNet: Faithfully Emulating Large Production Networks

Published:14 October 2017Publication History

ABSTRACT

Network reliability is critical for large clouds and online service providers like Microsoft. Our network is large, heterogeneous, complex and undergoes constant churns. In such an environment even small issues triggered by device failures, buggy device software, configuration errors, unproven management tools and unavoidable human errors can quickly cause large outages. A promising way to minimize such network outages is to proactively validate all network operations in a high-fidelity network emulator, before they are carried out in production. To this end, we present CrystalNet, a cloud-scale, high-fidelity network emulator. It runs real network device firmwares in a network of containers and virtual machines, loaded with production configurations. Network engineers can use the same management tools and methods to interact with the emulated network as they do with a production network. CrystalNet can handle heterogeneous device firmwares and can scale to emulate thousands of network devices in a matter of minutes. To reduce resource consumption, it carefully selects a boundary of emulations, while ensuring correctness of propagation of network changes. Microsoft's network engineers use CrystalNet on a daily basis to test planned network operations. Our experience shows that CrystalNet enables operators to detect many issues that could trigger significant outages.

Skip Supplemental Material Section

Supplemental Material

crystalnet.mp4

mp4

2.2 GB

References

  1. Cloudlab. https://www.cloudlab.us/.Google ScholarGoogle Scholar
  2. Emulab. https://www.emulab.net/.Google ScholarGoogle Scholar
  3. GNS3. https://www.gns3.com/.Google ScholarGoogle Scholar
  4. Introducing Data Center Fabric, the Next-Generation Facebook Data Center Network. https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook\-data-center-network/.Google ScholarGoogle Scholar
  5. Routing Design for Large Scale Datacenters: BGP is a better IGP! https://www.nanog.org/meetings/nanog55/presentations/Monday/Lapukhov.pdf.Google ScholarGoogle Scholar
  6. Al-Fares, M., Loukissas, A., and Vahdat, A. A Scalable, Commodity Data Center Network Architecture. In ACM SIGCOMM Computer Communication Review (2008), vol. 38, ACM, pp. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barefoot. P4 Software Switch. https://github.com/p4lang/behavioral-model/.Google ScholarGoogle Scholar
  8. Beckett, R., Gupta, A., Mahajan, R., and Walker, D. A general approach to network configuration verification. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2017), ACM, pp. 155--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Beckett, R., Mahajan, R., Millstein, T., Padhye, J., and Walker, D. Don't Mind the Gap: Bridging Network-wide Objectives and Device-level Configurations. In SIGCOMM (2016), ACM, pp. 328--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bosshart, P., Daly, D., Gibb, G., Izzard, M., McKeown, N., Rexford, J., Schlesinger, C., Talayco, D., Vahdat, A., Varghese, G., et al. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review 44, 3 (2014), 87--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fayaz, S. K., Sharma, T., Fogel, A., Mahajan, R., Millstein, T., Sekar, V., and Varghese, G. Efficient Network Reachability Analysis using a Succinct Control Plane Representation. In OSDI (2016), USENIX Association, pp. 217--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Feamster, N., and Balakrishnan, H. Verifying the Correctness of Wide-Area Internet Routing.Google ScholarGoogle Scholar
  13. Fogel, A., Fung, S., Pedrosa, L., Walraed-Sullivan, M., Govindan, R., Mahajan, R., and Millstein, T. D. A General Approach to Network Configuration Analysis. In NSDI (2015), pp. 469--483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ford, B., Srisuresh, P., and Kegel, D. Peer-to-Peer Communication Across Network Address Translators. In ATC (2005), pp. 179--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gember-Jacobson, A., Viswanathan, R., Akella, A., and Mahajan, R. Fast Control Plane Analysis using an Abstract Representation. In Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference (2016), ACM, pp. 300-- 313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Google. Google Compute Engine Incident NO.16007. Connectivity issues in all regions. https://status.cloud.google.com/incident/compute/16007.Google ScholarGoogle Scholar
  17. Griffin, T. G., Shepherd, F. B., and Wilfong, G. The Stable Paths Problem and Interdomain Routing. IEEE/ACM Transactions on Networking (ToN) 10, 2 (2002), 232--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Handigol, N., Heller, B., Jeyakumar, V., Lantz, B., and McKeown, N. Reproducible Network Experiments using Container-Based Emulation. In Proceedings of the 8th international conference on Emerging networking experiments and technologies (2012), ACM, pp. 253--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Horn, A., Kheradmand, A., and Prasad, M. R. Delta-net: Real-time Network Verification Using Atoms. arXiv preprint arXiv:1702.07375 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kang, H., and Tao, S. Container-based emulation of network control plane. In Proceedings of the Workshop on Hot Topics in Container Networking and Networked Systems (2017), ACM, pp. 24--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kazemian, P., Varghese, G., and McKeown, N. Header Space Analysis: Static Checking for Networks. In NSDI (2012), vol. 12, pp. 113--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Khurshid, A., Zhou, W., Caesar, M., and Godfrey, P. Veriflow: Verifying Network-Wide Invariants in Real Time. ACM SIGCOMM Computer Communication Review 42, 4 (2012), 467--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lopes, N. P., Bjørner, N., Godefroid, P., Jayaraman, K., and Varghese, G. Checking Beliefs in Dynamic Networks. In NSDI (2015), pp. 499--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Moy, J. T. OSPF: Anatomy of an Internet Routing Protocol. Addison-Wesley Professional, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ousterhout, A., Perry, J., Balakrishnan, H., and Lapukhov, P. Flexplane: An experimentation platform for resource management in datacenters. In NSDI (2017), pp. 438-- 451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Plotkin, G. D., Bjørner, N., Lopes, N. P., Rybalchenko, A., and Varghese, G. Scaling Network Verification using Symmetry and Surgery. In POPL (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Premji, A., Lapukhov, P., and Mitchell, J. RFC 7938: Use of BGP for Routing in Large-Scale Data Centers, 2016.Google ScholarGoogle Scholar
  28. Sung, Y.-W. E., Tie, X., Wong, S. H., and Zeng, H. Robotron: Top-down Network Management at Facebook Scale. In Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference (2016), ACM, pp. 426--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wette, P., Draxler, M., Schwabe, A., Wallaschek, F., Zahraee, M. H., and Karl, H. Maxinet: Distributed Emulation of Software-Defined Networks. In Networking Conference, 2014 IFIP (2014), IEEE, pp. 1--9.Google ScholarGoogle Scholar
  30. Yuan, L., Chen, H., Mai, J., Chuah, C.-N., Su, Z., and Mohapatra, P. Fireman: A Toolkit for Firewall Modeling and Analysis. In Security and Privacy, 2006 IEEE Symposium on (2006), IEEE, pp. 15--pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zhai, E., Chen, R., Wolinsky, D. I., and Ford, B. Heading Off Correlated Failures through Independence-as-a-Service. In OSDI (2014), pp. 317--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhu, Y., Kang, N., Cao, J., Greenberg, A., Lu, G., Mahajan, R., Maltz, D., Yuan, L., Zhang, M., Zhao, B. Y., et al. Packet-Level Telemetry in Large Datacenter Networks. In ACM SIGCOMM Computer Communication Review (2015), vol. 45, ACM, pp. 479--491. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CrystalNet: Faithfully Emulating Large Production Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles
      October 2017
      677 pages
      ISBN:9781450350853
      DOI:10.1145/3132747

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate131of716submissions,18%

      Upcoming Conference

      SOSP '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader