|
ABSTRACT
Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - to detect and analyze bugs, test for regressions, identify fault-tolerance problems or security compromises - can be difficult and error-prone. In this paper we argue that declarative development of distributed systems is well suited to tackle these tasks. We present an application logging, monitoring, and debugging facility that we have built on top of the P2 system, comprising an introspection model, an execution tracing component, and a distributed query processor. We use this facility to demonstrate a range of on-line distributed diagnosis tools that range from simple, local state assertions to sophisticated global property detectors on consistent snapshots. These tools are small, simple, and can be deployed piecemeal on-line at any point during a system's life cycle. Our evaluation suggests that the overhead of our approach to improving and monitoring running distributed systems continuously is well in tune with its benefits.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Marcos K. Aguilera , Jeffrey C. Mogul , Janet L. Wiener , Patrick Reynolds , Athicha Muthitacharoen, Performance debugging for distributed systems of black boxes, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
2
|
P. T. Barham, A. Donnelly, R. Isaacs, and R. Mortier. Using Magpie for Request Extraction and Workload Modelling. In Proceedings of USENIX Operating System Design and Implementation (OSDI), San Francisco, CA, USA, Dec. 2004.
|
| |
3
|
P. Bates, J. Wileden, and V. Lesser. A Debugging Tool for Distributed Systems. In Proceedings of the Second Annual Phoenix Conference on Computers and Communications, Phoenix, AZ, USA, 1983.
|
| |
4
|
A. Chanda, K. Elmeleegy, A. Cox, and W. Zwaenepoel. Causeway: System Support for Controlling and Analyzing the Execution of Distributed Programs. In Proceedings of USENIX Hot Topics in Operating System (HotOS), Santa Fe, NM, USA, June 2005.
|
 |
5
|
|
| |
6
|
M. Y. Chen, A. Accardi, E. Kiciman, J. Lloyd, D. Patterson, A. Fox, and E. Brewer. Path-based Failure and Evolution Management. In Proceedings of USENIX Networked Systems Design and Implementation (NSDI), San Francisco, CA, USA, Mar. 2004.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
D. Geels, G. Altekar, S. Shenker, and I. Stoica. Replay Debugging for Distributed Applications. In Proceedings of the USENIX Annual Technical Conference, Boston, MA, USA, May 2006.
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
IBM Websphere XD. http://www-306.ibm.com/common/ssi/fcgi-bin/ssialias?infotype=an&subtype=ca&htmlfid=897/ENUS206--010, Jan. 2006.
|
| |
15
|
E. Kiciman and L. Subramanian. A Root Cause Localization Model for Large Scale Systems. In Proceedings of USENIX Hot Topics On Dependability (HotDep), Yokohama, Japan, June 2005.
|
 |
16
|
|
 |
17
|
|
| |
18
|
S. Lin, A. Pan, and Z. Zhang. WiDS: an Integrated Toolkit for Distributed System Developement. In Proceedings of USENIX Hot Topics in Operating System (HotOS), Santa Fe, NM, USA, June 2005.
|
 |
19
|
Boon Thau Loo , Tyson Condie , Joseph M. Hellerstein , Petros Maniatis , Timothy Roscoe , Ion Stoica, Implementing declarative overlays, Proceedings of the twentieth ACM symposium on Operating systems principles, October 23-26, 2005, Brighton, United Kingdom
|
 |
20
|
Ratul Mahajan , David Wetherall , Tom Anderson, Understanding BGP misconfiguration, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
| |
21
|
P. Reynolds, J. L. Biener, J. C. Mogul, M. A. Shah, C. Killian, and A. Vahdat. Pip: Detecting the Unexpected in Distributed Systems. In Proceedings of USENIX Networked Systems Design and Implementation (NSDI), San Jose, CA, USA, May 2006.
|
 |
22
|
|
| |
23
|
Ion Stoica , Robert Morris , David Liben-Nowell , David R. Karger , M. Frans Kaashoek , Frank Dabek , Hari Balakrishnan, Chord: a scalable peer-to-peer lookup protocol for internet applications, IEEE/ACM Transactions on Networking (TON), v.11 n.1, p.17-32, February 2003
[doi> 10.1109/TNET.2002.808407]
|
| |
24
|
H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic Misconfiguration Troubleshooting with PeerPressure. In Proceedings of USENIX Operating System Design and Implementation (OSDI), San Francisco, CA, USA, Dec. 2004.
|
| |
25
|
A. Whitaker, R. Cox, and S. Gribble. Configuration Debugging as Search: Finding the Needle in the Haystack. In Proceedings of USENIX Operating System Design and Implementation (OSDI), San Francisco, CA, USA, Dec. 2004.
|
| |
26
|
|
CITED BY 3
|
|
|
|
|
Xuezheng Liu , Zhenyu Guo , Xi Wang , Feibo Chen , Xiaochen Lian , Jian Tang , Ming Wu , M. Frans Kaashoek , Zheng Zhang, D3S: debugging deployed distributed systems, Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, p.423-437, April 16-18, 2008, San Francisco, California
|
|
David Chu , Lucian Popa , Arsalan Tavakoli , Joseph M. Hellerstein , Philip Levis , Scott Shenker , Ion Stoica, The design and implementation of a declarative sensor network system, Proceedings of the 5th international conference on Embedded networked sensor systems, November 06-09, 2007, Sydney, Australia
|
|