| Deriving quantitative models for correlation clusters |
| Full text |
Pdf
(972 KB)
|
| Source
|
Conference on Knowledge Discovery in Data
archive
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Philadelphia, PA, USA
SESSION: Research track papers
table of contents
Pages: 4 - 13
Year of Publication: 2006
ISBN:1-59593-339-5
|
|
Authors
|
|
Elke Achtert
|
Ludwig-Maximilians-Universität München, Munich, Germany
|
|
Christian Böhm
|
Ludwig-Maximilians-Universität München, Munich, Germany
|
|
Hans-Peter Kriegel
|
Ludwig-Maximilians-Universität München, Munich, Germany
|
|
Peer Kröger
|
Ludwig-Maximilians-Universität München, Munich, Germany
|
|
Arthur Zimek
|
Ludwig-Maximilians-Universität München, Munich, Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 139, Citation Count: 1
|
|
|
ABSTRACT
Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only a first step in the pipeline of advanced data analysis and system modelling. The second (post-clustering) step of deriving a quantitative model for each correlation cluster has not been addressed so far. In this paper, we describe an original approach to handle this second step. We introduce a general method that can extract quantitative information on the linear dependencies within a correlation clustering. Our concepts are independent of the clustering model and can thus be applied as a post-processing step to any correlation clustering algorithm. Furthermore, we show how these quantitative models can be used to predict the probability distribution that an object is created by these models. Our broad experimental evaluation demonstrates the beneficial impact of our method on several applications of significant practical importance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Achtert, C. Böhm, H.-P. Kriegel, P. Kröger, and A. Zimek. Robust, complete, and efficient correlation clustering. submitted.
|
 |
2
|
Charu C. Aggarwal , Joel L. Wolf , Philip S. Yu , Cecilia Procopiuc , Jong Soo Park, Fast algorithms for projected clustering, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.61-72, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
3
|
|
 |
4
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1--31, 1977.
|
| |
9
|
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, 1996.
|
| |
10
|
E. Georgii, L. Richter, U. Rückert, and S. Kramer. Analyzing microarray data using quantitative association rules. Bioinformatics, 21(Suppl. 2):ii1--ii8, 2005.
|
| |
11
|
|
| |
12
|
J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123--129, 1972.
|
| |
13
|
D. Husmeier. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics, 19(17):2271--2282, 2003.
|
| |
14
|
K. Kailing, H.-P. Kriegel, and P. Kröger. Density-connected subspace clustering for high-dimensional data. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM), Orlando, FL, 2004.
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
T. Yuster. The reduced row echelon form of a matrix is unique: A simple proof. Mathematics Magazine, 57(2):93--94, 1984.
|
|