ACM Home Page
Please provide us with feedback. Feedback
Clustering gene expression data in SQL using locally adaptive metrics
Full text pdf formatPdf (278 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
San Diego, California
SESSION: DB integration table of contents
Pages: 35 - 41  
Year of Publication: 2003
Authors
Dimitris Papadopoulos  UC Riverside
Carlotta Domeniconi  George Mason University
Dimitrios Gunopulos  UC Riverside
Sheng Ma  IBM T. J. Watson Research Center
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 33,   Citation Count: 2
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/882082.882091
What is a DOI?

ABSTRACT

The clustering problem concerns the discovery of homogeneous groups of data according to a certain similarity measure. Clustering suffers from the curse of dimensionality. It is not meaningful to look for clusters in high dimensional spaces as the average density of points anywhere in input space is likely to be low. As a consequence, distance functions that equally use all input features may be ineffective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction techniques. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. In this paper we present an efficient SQL implementation of our algorithm, that enables the discovery of clusters on data residing inside a relational DBMS.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
P. Arabie and L. Hubert. An overview of combinatorial data analysis. Clustering and Classification. World Scientific Pub., pages 5--63, 1996.
 
3
 
4
 
5
P. Cheeseman and J. Stutz. Bayessian Classification (autoclass): Theory and Results, chapter 6. AAAI/MIT Press, 1996.
 
6
 
7
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1997.
 
8
 
9
 
10
M. Ester, H. P. Kriegel, and X. Xu. A database interface for clustering in large spatial databases. In Proc. KDD, 1995.
 
11
Z. Ghahramani and G. E. Hinton. The EM Algorithm for Mixtures of Factor Analyzers. Technical Report CRG-TR-96-1, Dept. of Computer Science, Univ. of Toronto, 1996.
12
 
13
R. Michalski and R. Stepp. Machine Learning: An Artificial Intelligence Approach, chapter 'Learning from observation: Conceptual Clustering'. IOGA Publishing Co., 1983.
 
14
15
16
17
 
18
19
20

Collaborative Colleagues:
Dimitris Papadopoulos: colleagues
Carlotta Domeniconi: colleagues
Dimitrios Gunopulos: colleagues
Sheng Ma: colleagues