ACM Home Page
Please provide us with feedback. Feedback
Using queries for distributed monitoring and forensics
Full text PdfPdf (2.26 MB)
Source European Conference on Computer Systems archive
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006 table of contents
Leuven, Belgium
SESSION: Management table of contents
Pages: 389 - 402  
Year of Publication: 2006
ISBN:1-59593-322-0
Also published in ...
Authors
Atul Singh  Rice University
Petros Maniatis  Intel Research Berkeley
Timothy Roscoe  Intel Research Berkeley
Peter Druschel  Rice University
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 47,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1217935.1217973
What is a DOI?

ABSTRACT

Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
P. T. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for Request Extraction and Workload Modelling. In Proceedings of USENIX Operating System Design and Implementation (OSDI), San Francisco, CA, USA, Dec. 2004.
 
3
P. Bates, J. Wileden, and V. Lesser. A Debugging Tool for Distributed Systems. In Proceedings of the Second Annual Phoenix Conference on Computers and Communications, Phoenix, AZ, USA, 1983.
 
4
A. Chanda, K. Elmeleegy, A. Cox, and W. Zwaenepoel. Causeway: System Support for Controlling and Analyzing the Execution of Distributed Programs. In Proceedings of USENIX Hot Topics in Operating System (HotOS), Santa Fe, NM, USA, June 2005.
5
 
6
M. Y. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer. Path-based Failure and Evolution Management. In Proceedings of USENIX Networked Systems Design and Implementation (NSDI), San Francisco, CA, USA, Mar. 2004.
 
7
 
8
 
9
 
10
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the USENIX Annual Technical Conference, Boston, MA, USA, May 2006.
11
12
 
13
 
14
IBM Websphere XD. http://www-306.ibm.com/common/ssi/fcgi-bin/ssialias?infotype=an&subtype=ca&htmlfid=897/ENUS206--010, Jan. 2006.
 
15
E. Kiciman and L. Subramanian. A Root Cause Localization Model for Large Scale Systems. In Proceedings of USENIX Hot Topics On Dependability (HotDep), Yokohama, Japan, June 2005.
16
17
 
18
S. Lin, A. Pan, and Z. Zhang. WiDS: an Integrated Toolkit for Distributed System Developement. In Proceedings of USENIX Hot Topics in Operating System (HotOS), Santa Fe, NM, USA, June 2005.
19
20
 
21
P. Reynolds, J. L. Biener, J. C. Mogul, M. A. Shah, C. Killian, and A. Vahdat. Pip: Detecting the Unexpected in Distributed Systems. In Proceedings of USENIX Networked Systems Design and Implementation (NSDI), San Jose, CA, USA, May 2006.
22
 
23
 
24
H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic Misconfiguration Troubleshooting with PeerPressure. In Proceedings of USENIX Operating System Design and Implementation (OSDI), San Francisco, CA, USA, Dec. 2004.
 
25
A. Whitaker, R. Cox, and S. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. In Proceedings of USENIX Operating System Design and Implementation (OSDI), San Francisco, CA, USA, Dec. 2004.
 
26


Collaborative Colleagues:
Atul Singh: colleagues
Petros Maniatis: colleagues
Timothy Roscoe: colleagues
Peter Druschel: colleagues