ACM Home Page
Please provide us with feedback. Feedback
Learning correlations using the mixture-of-subsets model
Full text PdfPdf (1.11 MB)
Source
ACM Transactions on Knowledge Discovery from Data (TKDD) archive
Volume 1 ,  Issue 4  (January 2008) table of contents
Article No. 3  
Year of Publication: 2008
ISSN:1556-4681
Authors
Manas Somaiya  University of Florida, Gainesville, FL
Christopher Jermaine  University of Florida, Gainesville, FL
Sanjay Ranka  University of Florida, Gainesville, FL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 34,   Downloads (12 Months): 349,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1324172.1324175
What is a DOI?

ABSTRACT

Using a mixture of random variables to model data is a tried-and-tested method common in data mining, machine learning, and statistics. By using mixture modeling it is often possible to accurately model even complex, multimodal data via very simple components. However, the classical mixture model assumes that a data point is generated by a single component in the model. A lot of datasets can be modeled closer to the underlying reality if we drop this restriction. We propose a probabilistic framework, the mixture-of-subsets (MOS) model, by making two fundamental changes to the classical mixture model. First, we allow a data point to be generated by a set of components, rather than just a single component. Next, we limit the number of data attributes that each component can influence. We also propose an EM framework to learn the MOS model from a dataset, and experimentally evaluate it on real, high-dimensional datasets. Our results show that the MOS model learned from the data represents the underlying nature of the data accurately.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
Aldous, D. J. 1985. Exchangeability and related topics. In ‘Ecole d’ “Et” e de Probabilities de SainFlour XII. Lecture Notes in Math, vol. 1117. Springer, Berlin.
 
6
7
 
8
Bilmes, J. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Tech. Rep. ICSI-TR-97-021, University of Berkeley.
9
10
11
12
 
13
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statist. Soc. B-39, 1--39.
 
14
15
 
16
Friedman, J. and Meulman, J. 2004. Clustering objects on subsets of attributes. J. Royal Statist. Soc. Series B(Statist. Methodol.) 66, 4, 815--849.
17
 
18
Graham, M. and Miller, D. 2006. Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection. IEEE Trans. Signal Proc. 54, 4, 1289--1303.
 
19
Griffiths, T. and Ghahramani, Z. 2006. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18, Y. Weiss, et al., eds. MIT Press, Cambridge, MA, 475--482.
20
 
21
McLachlan, G. J. and Basford, K. E. 1988. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.
 
22
McLachlan, G. J., Bean, R. W., and Peel, D. 2002. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18, 3, 413--422.
 
23
McLachlan, G. J. and Peel, D. 2000. Finite Mixture Models. Wiley, New York.
 
24
Nagesh, H., Goil, S., and Choudhary, A. 1999. Mafia: Efficient and scalable subspace clustering for very large datasets. Tech. Rep. CPDC-TR-9906-010. Center for Parallel and Distributed Computing, North Western University.
 
25
Pitman, J. 2002. Combinatorial stochastic processes. Notes for Saint Flour Summer School.
26
 
27
Woo, K.-G., Lee, J.-H., Kim, M.-H., and Lee, Y.-J. 2004. Findit: A fast and intelligent subspace clustering algorithm using dimension voting. Inf. Softw. Technol. 46, 4, 255--271.
 
28

Collaborative Colleagues:
Manas Somaiya: colleagues
Christopher Jermaine: colleagues
Sanjay Ranka: colleagues