skip to main content
article

Resilient overlay networks

Published:21 October 2001Publication History
Skip Abstract Section

Abstract

A Resilient Overlay Network (RON) is an architecture that allows distributed Internet applications to detect and recover from path outages and periods of degraded performance within several seconds, improving over today's wide-area routing protocols that take at least several minutes to recover. A RON is an application-layer overlay on top of the existing Internet routing substrate. The RON nodes monitor the functioning and quality of the Internet paths among themselves, and use this information to decide whether to route packets directly over the Internet or by way of other RON nodes, optimizing application-specific routing metrics.Results from two sets of measurements of a working RON deployed at sites scattered across the Internet demonstrate the benefits of our architecture. For instance, over a 64-hour sampling period in March 2001 across a twelve-node RON, there were 32 significant outages, each lasting over thirty minutes, over the 132 measured paths. RON's routing mechanism was able to detect, recover, and route around all of them, in less than twenty seconds on average, showing that its methods for fault detection and recovery work well at discovering alternate paths in the Internet. Furthermore, RON was able to improve the loss rate, latency, or throughput perceived by data transfers; for example, about 5% of the transfers doubled their TCP throughput and 5% of our transfers saw their loss probability reduced by 0.05. We found that forwarding packets via at most one intermediate RON node is sufficient to overcome faults and improve performance in most cases. These improvements, particularly in the area of fault detection and recovery, demonstrate the benefits of moving some of the control over routing into the hands of end-systems.

References

  1. 1 ANDERSEN, D. G. Resilient Overlay Networks. Master's thesis, Massachusetts Institute of Technology, May 2001.]]Google ScholarGoogle Scholar
  2. 2 BALAKRISHNAN, H., SESHAN, S., STEMM, M., AND KATZ, R. Analyzing Stability in Wide-Area Network Performance. In Proc. ACM SIGMETRICS (Seattle, WA, June 1997), pp. 2-12.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 CHANDRA, B., DAHLIN, M., GAG, L., AND NAYATE, A. End-to-end WAN Service Availability. In Proc. 3rd USITS (San Francisco, CA, 2001), pp. 97-108.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 CLARK, D. Policy Routing in Internet Protocols. Interact Engineering Task Force, May 1989. RFC 1102.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 COLLINS, A. The Detour Framework for Packet Rerouting. Master's thesis, University of Washington, Oct. 1998.]]Google ScholarGoogle Scholar
  6. 6 ERIKSSON, H. Mbone: The Multicast Backbone. Communications of the ACM 37, 8 (1994), 54-60.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 FLOYD, S., HANDLEY, M., PADHYE, J., AND WIDMER, J. Equation-Based Congestion Control for Unicast Applications. In Prec. ACM SIGCOMM (Stockholm, Sweden, Sept. 2000), pp. 43-54.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 GOYAL, M., GUERIN, R., AND RAJAN, R. Predicting TCP Throughput From Non-invasive Data. (Unpublished, http : //www. seas. upenn, edu : 8080/~guerin/publ icat ions/TCP_model. pdf).]]Google ScholarGoogle Scholar
  9. 9 GUARDINI, I., FASANO, P., AND G1RARDI, G. IPv6 Operational Experience within the 6bone. In Prec. lnternet Society (INET) Conf. (Yokohama, Japan, July 2000). http://www.5.see.org/ inet2OOO/cdproceedings/le/le_l .htm.]]Google ScholarGoogle Scholar
  10. 10 HAGENS, R., HALL, N., AND ROSE, M. Use of the Internet as a Subnetwork for Experimentation with the OSI Network Layer. Interact Engineering Task Force, Feb 1989. RFC 1070.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 KHANNA, A., AND ZINKY, J. The Revised ARPANET Routing Metric. In Prec. ACMSIGCOMM (Austin, TX, Sept. 1989), pp. 45-56.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 LABOVITZ, C., AHUJA, A., BOSE, A., AND JAHANIAN, F. Delayed Interact Routing Convergence. In Prec. ACM SIGCOMM (Stockholm, Sweden, September 2000), pp. 175-I 87.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13 LABOVITZ, C., MALAN, R., AND JAHANIAN, F. Interact Routing Instability. IEEE/ACM Transactions on Networking 6, 5 (1998), 515-526.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 MCCANNE, S., AND JACOBSON, W. The BSD Packet Filter: A New Architecture for User-Level Packet Capture. In Prec. Winter '93 USENIX Conference (San Diego, CA, Jan. 1993), pp. 259-269.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 The North American Network Operators' Group mailing list archive. http : //www. cctec, com/maillists/nanog/.]]Google ScholarGoogle Scholar
  16. 16 PADHYE, J., FIROIU, V., TOWSLEY, D., AND KUROSE, J. Modeling TCP Throughput: A Simple Model and its Empirical Validation. In Prec. ACM SIGCOMM (Vancouver, Canada, September 1998), pp. 303-323.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 PARTRIDGE, C. Using the Flow Label Field in 1Pv6. Internet Engineering Task Force, 1995. RFC 1809.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18 PAXSON, V. End-to-End Routing Behavior in the Internet. In Prec. ACM SIGCOMM '96 (Stanford, CA, Aug. 1996), pp. 25-38.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 PAXSON, V. End-to-End Interact Packet Dynamics. In Prec. ACM SIGCOMM (Cannes, France, Sept. 1997), pp. 139-152.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 POSTEL, J. B. Transmission Control Protocol. Interact Engineering Task Force, September 1981. RFC 793.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 REKHTER, Y., AND LI, T. A Border Gateway Protocol 4 (BGP-4). Interact Engineering Task Force, 1995. RFC 1771.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 SAVAGE, S., ANDERSON, T., ET AL. Detour: A Case for Informed Interact Routing and Transport. IEEEMicro 19, 1 (Jan. 1999), 50-59.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 SAVAGE, S., COLLINS, A., HOFFMAN, E., SNELL, J., AND ANDERSON, T. The End-to-End Effects of lnternet Path Selection. In Proc. ACM SIGCOMM (Boston, MA, 1999), pp. 289-299.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24 SESHAN, S., STEMM, M., AND KATZ, R. H. SPAND: Shared Passive Network Performance Discovery. In Proc. 1st USITS (Monterey, CA, December 1997), pp. 135-146.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 SHAIKH, A., KALAMPOUKAS, L., VARMA, A., AND DUBE, R. Routing Stability in Congested Networks: Experimentation and Analysis. In Proc. ACM SIGCOMM (Stockholm, Sweden, 2000), pp. 163-174.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 TOUCH, J., AND HOTZ, S. The X-Bone. In Proc. 3rd Global Internet Mini-Conference (Sydney, Australia, Nov. 1998), pp. 75-83.]]Google ScholarGoogle Scholar

Index Terms

  1. Resilient overlay networks

      Recommendations

      Reviews

      Alexandru Petrescu

      In a world where peer-to-peer networks flourish, and where Internet path congestion and oscillations are daily events, the need for new efficient routing mechanisms is more and more pressing. Andersen, Balakrishnan, Kaashoek, and Morris present resilient overlay networks (RONs) as groups of nodes, distributed over large areas, whose users agree to engage in cooperative networking, and whose paths are formed over actual Internet routing paths. The main characteristics of a RON are that the number of participating nodes is small (up to 50), and that communication between sites follows paths that circumvent temporary failures of the actual Internet paths. This is achieved by continuous probing of the direct links between sites, and by employing a new link-state routing protocol (different than open shortest path first (OSPF) and border gateway protocol (BGP)). Simply put, when the underlying segments of a direct path between two RON nodes fail (BGP failures are often cited), the overlay network redirects the entire path toward an intermediary RON node, apparently lengthening the entire path, but still offering connectivity. RON nodes have addresses different than Internet protocol version 4 (IPv4) or Internet protocol version 6 (IPv6). Actual experiments, performed by the authors, included a 16-node deployment in the USA and Europe. As expected, another distinguishing trait of RON networking is the ability of applications at the uppermost layer to make routing decisions (traditionally, routing and application layers are separated, with the inconvenience of application interruption when routing fails). The authors pay detailed attention to motivating the overlaying routing approach. Not only do they describe an actual implementation, including simulation, test deployment, and performance measurements, but they also address, in a separate discussion section, tough questions on potential violation of the deployed Internet policy routing (presumably due to tunneling), limited RON size and scalability, and network address translation (NAT) traversal. Finally, one aspect whose treatment seems to be overlooked is one that lies at the very heart of a routing protocol: loop avoidance. While a description of path lookup and building by using link-state exchanges is given, proofs (at least conceptual) of loop avoidance are not mentioned at all. The paper provides a comprehensive bibliographical list. Many of the references are correctly used as explanations of the current BGP routing instabilities, as well as of how these influence transmission control protocol (TCP) applications; thus, they offer perfect motivations for the need to overlay network routing. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGOPS Operating Systems Review
        ACM SIGOPS Operating Systems Review  Volume 35, Issue 5
        Dec. 2001
        243 pages
        ISSN:0163-5980
        DOI:10.1145/502059
        Issue’s Table of Contents
        • cover image ACM Conferences
          SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles
          October 2001
          254 pages
          ISBN:1581133898
          DOI:10.1145/502034

        Copyright © 2001 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 October 2001

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader