skip to main content
10.1145/2934872.2934910acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Public Access

The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance

Published: 22 August 2016 Publication History

Abstract

In this paper, we propose a new approach to diagnosing problems in complex distributed systems. Our approach is based on the insight that many of the trickiest problems are anomalies. For instance, in a network, problems often affect only a small fraction of the traffic (e.g., perhaps a certain subnet), or they only manifest infrequently. Thus, it is quite common for the operator to have “examples” of both working and non-working traffic readily available – perhaps a packet that was misrouted, and a similar packet that was routed correctly. In this case, the cause of the problem is likely to be wherever the two packets were treated differently by the network.
We present the design of a debugger that can leverage this information using a novel concept that we call differential provenance. Differential provenance tracks the causal connections between network states and state changes, just like classical provenance, but it can additionally perform root-cause analysis by reasoning about the differences between two provenance trees. We have built a diagnostic tool that is based on differential provenance, and we have used our tool to debug a number of complex, realistic problems in two scenarios: software-defined networks and MapReduce jobs. Our results show that differential provenance can be maintained at relatively low cost, and that it can deliver very precise diagnostic information; in many cases, it can even identify the precise root cause of the problem.

Supplementary Material

MP4 File (p115.mp4)

References

[1]
RapidNet. http://netdb.cis.upenn.edu/rapidnet/.
[2]
Y. Amsterdamer, D. Deutch, and V. Tannen. Provenance for aggregate queries. In Proc. PODS, 2011.
[3]
M. Attariyan and J. Flinn. Using causality to diagnose configuration bugs. In Proc. USENIX ATC, 2008.
[4]
The Beacon Controller. https://openflow.stanford.edu/display/Beacon/Home.
[5]
P. Bille. A survey on tree edit distance and related problems. Theor. Comput. Sci., 337(1-3):217–239, June 2005.
[6]
P. Buneman, S. Khanna, and W.-C. Tan. Why and where: A characterization of data provenance. In Proc. ICDT, Jan. 2001.
[7]
CAIDA. http://www.caida.org/home/.
[8]
A. Chen, Y. Wu, A. Haeberlen, W. Zhou, and B. T. Loo. Differential provenance: Better network diagnostics with reference events. In Proc. HotNets, Nov. 2015.
[9]
M. Dietz, S. Shekhar, Y. Pisetsky, A. Shu, and D. S. Wallach. Quire: Lightweight provenance for smart phone operating systems. In Proc. USENIX Security, 2011.
[10]
R. Durairajan, J. Sommers, and P. Barford. Controller-agnostic SDN debugging. In Proc. CoNEXT, 2014.
[11]
A. Gehani and D. Tariq. SPADE: Support for provenance auditing in distributed environments. In Proc. Middleware, 2012.
[12]
T. G. Griffin, F. B. Shepherd, and G. Wilfong. The stable paths problem and interdomain routing. IEEE/ACM Trans. Netw., 10(2):232–243, Apr. 2002.
[13]
N. Handigol, B. Heller, V. Jeyakumar, D. Mazières, and N. McKeown. I know what your packet did last hop: Using packet histories to troubleshoot networks. In Proc. NSDI, Apr. 2014.
[14]
S. Kandula, R. Mahajan, P. Verkaik, S. Agarwal, J. Padhye, and P. Bahl. Detailed diagnosis in enterprise networks. In Proc. SIGCOMM, August 2009.
[15]
P. Kazemian, M. Chang, H. Zeng, G. Varghese, N. McKeown, and S. Whyte. Real time network policy checking using header space analysis. In Proc. NSDI, Apr. 2013.
[16]
P. Kazemian, G. Varghese, and N. McKeown. Header space analysis: Static checking for networks. In Proc. NSDI, 2012.
[17]
C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proc. NSDI, 2007.
[18]
B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ramakrishnan, T. Roscoe, and I. Stoica. Declarative networking. Comm. ACM, 52(11):87–95, Nov. 2009.
[19]
H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King. Debugging the data plane with Anteater. In Proc. SIGCOMM, 2012.
[20]
Mininet. http://mininet.org/.
[21]
C. Monsanto, J. Reich, N. Foster, J. Rexford, and D. Walker. Composing software-defined networks. In Proc. NSDI, 2013.
[22]
K.-K. Muniswamy-Reddy, U. Braun, D. A. Holland, P. Macko, D. Maclean, D. Margo, M. Seltzer, and R. Smogor. Layering in provenance systems. In Proc. USENIX ATC, 2009.
[23]
K. Pan, S. Kim, and E. J. Whitehead Jr. Toward an understanding of bug fix patterns. Empirical Software Engineering, 14(3):286–315, 2009.
[24]
J. H. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, W.-F. Wong, Y. Zibin, M. D. Ernst, and M. Rinard. Automatically patching errors in deployed software. In Proc. SOSP, 2009.
[25]
J. Ruckert, J. Blendin, and D. Hausheer. Rasp: Using OpenFlow to push overlay streams into the underlay. In Proc. P2P, 2013.
[26]
C. Scott, A. Panda, V. Brajkovic, G. Necula, A. Krishnamurthy, and S. Shenker. Minimizing faulty executions of distributed systems. In Proc. NSDI, Mar. 2016.
[27]
C. Scott, A. Wundsam, B. Raghavan, A. Panda, A. Or, J. Lai, E. Huang, Z. Liu, A. El-Hassany, S. Whitlock, H. Acharya, K. Zarifis, and S. Shenker. Troubleshooting blackbox SDN control software with minimal causal sequences. In Proc. SIGCOMM, 2014.
[28]
K. Shen, C. Stewart, C. Li, and X. Li. Reference-driven performance anomaly identification. In Proc. SIGMETRICS, 2009.
[29]
H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic misconfiguration troubleshooting with PeerPressure. In Proc. OSDI, 2004.
[30]
Y. Wu, M. Zhao, A. Haeberlen, W. Zhou, and B. T. Loo. Diagnosing missing events in distributed systems with negative provenance. In Proc. SIGCOMM, 2014.
[31]
A. Wundsam, D. Levin, S. Seetharaman, and A. Feldmann. OFRewind: Enabling record and replay troubleshooting for networks. In Proc. ATC, 2011.
[32]
H. Zeng, P. Kazemian, G. Varghese, and N. McKeown. Automatic test packet generation. In Proc. CoNEXT, 2012.
[33]
J. Zhang, L. Renganarayana, X. Zhang, N. Ge, V. Bala, T. Xu, and Y. Zhou. EnCore: Exploiting system environment and correlation information for misconfiguration detection. In Proc. ASPLOS, 2014.
[34]
W. Zhou, Q. Fei, A. Narayan, A. Haeberlen, B. T. Loo, and M. Sherr. Secure network provenance. In Proc. SOSP, Oct. 2011.
[35]
W. Zhou, S. Mapara, Y. Ren, Y. Li, A. Haeberlen, Z. Ives, B. T. Loo, and M. Sherr. Distributed time-aware provenance. In Proc. VLDB, Aug. 2013.
[36]
W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at Internet-scale. In Proc. SIGMOD, 2010.
[37]
Y. Zhu, N. Kang, J. Cao, A. Greenberg, G. Lu, R. Mahajan, D. Maltz, L. Yuan, M. Zhang, B. Y. Zhao, and H. Zheng. Packet-level telemetry in large datacenter networks. In Proc. SIGCOMM, Aug. 2015.

Cited By

View all
  • (2024)AutoSketchProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691911(1551-1572)Online publication date: 16-Apr-2024
  • (2024)Automatic Configuration RepairProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696895(213-220)Online publication date: 18-Nov-2024
  • (2024)Localized Explanations for Automatically Synthesized Network ConfigurationsProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696888(52-59)Online publication date: 18-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '16: Proceedings of the 2016 ACM SIGCOMM Conference
August 2016
645 pages
ISBN:9781450341936
DOI:10.1145/2934872
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Debugging
  2. Network diagnostics
  3. Provenance

Qualifiers

  • Research-article

Funding Sources

Conference

SIGCOMM '16
Sponsor:
SIGCOMM '16: ACM SIGCOMM 2016 Conference
August 22 - 26, 2016
Florianopolis, Brazil

Acceptance Rates

SIGCOMM '16 Paper Acceptance Rate 39 of 231 submissions, 17%;
Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)242
  • Downloads (Last 6 weeks)44
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AutoSketchProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691911(1551-1572)Online publication date: 16-Apr-2024
  • (2024)Automatic Configuration RepairProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696895(213-220)Online publication date: 18-Nov-2024
  • (2024)Localized Explanations for Automatically Synthesized Network ConfigurationsProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696888(52-59)Online publication date: 18-Nov-2024
  • (2024)Differential Analysis for System Provenance2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00455(5649-5653)Online publication date: 13-May-2024
  • (2024)SD-MDN-TM: A traceback and mitigation integrated mechanism against DDoS attacks with IP spoofingComputer Networks10.1016/j.comnet.2024.110793(110793)Online publication date: Sep-2024
  • (2023)PUMMProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620284(823-840)Online publication date: 9-Aug-2023
  • (2023)Murphy: Performance Diagnosis of Distributed Cloud ApplicationsProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604877(438-451)Online publication date: 10-Sep-2023
  • (2023)Synthesizing Formal Network Specifications From Input-Output ExamplesIEEE/ACM Transactions on Networking10.1109/TNET.2022.320855131:3(994-1009)Online publication date: Jun-2023
  • (2023)Transparent and Tamper-Proof Event Ordering in the Internet of Things PlatformsIEEE Internet of Things Journal10.1109/JIOT.2022.322245010:6(5335-5348)Online publication date: 15-Mar-2023
  • (2023)Aegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud Systems2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00026(222-233)Online publication date: May-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media