|
ABSTRACT
Grid systems are proving increasingly useful for managing the batch computing jobs of organizations. One well-known example is Intel, whose internally developed NetBatch system manages tens of thousands of machines. The size, heterogeneity, and complexity of grid systems make them very difficult, however, to configure. This often results in misconfigured machines, which may adversely affect the entire system.We investigate a distributed data mining approach for detection of misconfigured machines. Our Grid Monitoring System (GMS) non-intrusively collects data from all sources (log files, system services, etc.) available throughout the grid system. It converts raw data to semantically meaningful data and stores this data on the machine it was obtained from, limiting incurred overhead and allowing scalability. Afterwards, when analysis is requested, a distributed outliers detection algorithm is employed to identify misconfigured machines. The algorithm itself is implemented as a recursive workflow of grid jobs. It is especially suited to grid systems, in which the machines might be unavailable most of the time and often fail altogether.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
J. W. Branch, B. Szymanski, C. Giannella, R. Wolff, and H. Kargupta. In-network outlier detection in wireless sensor networks. In Proc. of ICDCS, July 2006.
|
| |
4
|
M. Cannataro, A. Massara, and P. Veltri. The OnBrowser ontology manager: Managing ontologies on the grid. In Intl. Workshop on Semantic Intelligent Middleware for the Web and the Grid, 2004.
|
| |
5
|
|
| |
6
|
|
| |
7
|
M. J. Litzkow, M. Livny, and M. W. Mutka. Condor - A hunter of idle workstations. In Proc. of ICDCS, June 1988.
|
|