research-article

ABBA: adaptive bicluster-based approach to impute missing values in binary matrices

Authors:
Alessandro Colantonio

Engiweb Security, Roma, Italy

Engiweb Security, Roma, Italy
View Profile

,
Roberto Di Pietro

Università di Roma Tre, Roma, Italy

Università di Roma Tre, Roma, Italy
View Profile

,
Alberto Ocello

Engiweb Security, Roma, Italy

Engiweb Security, Roma, Italy
View Profile

,
Nino Vincenzo Verde

Università di Roma Tre, Roma, Italy

Università di Roma Tre, Roma, Italy
View Profile

SAC '10: Proceedings of the 2010 ACM Symposium on Applied ComputingMarch 2010Pages 1026–1033https://doi.org/10.1145/1774088.1774304

Published:22 March 2010Publication History

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

Pages 1026–1033

ABSTRACT

Missing values frequently pose problems in binary matrices analysis since they can hinder downstream analysis of the datasets. Despite the presence of many imputation methods that have been developed to substitute missing values with estimated values, these available techniques have some common disadvantages: they need to fix some parameters (e.g., number of patterns, number of rows to consider) to estimate missing values---with little theoretical support to determine these parameters---; and, missing values need to be recomputed from scratch as parameters change.

In this paper we propose a novel algorithm (ABBA: Adaptive Bicluster-Based Approach) that does not have the above limitations. Further, a formal framework that justifies the rationales behind ABBA is detailed. Finally, experimental results over both synthetic and real data confirm the viability of our approach and the quality of the results, that overcomes the ones achieved by the main competing algorithm (KNN).

References

R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207--216, Washington, D.C., 1993. Google ScholarDigital Library
E. H. Armin, A. O. Schmitt, J. Lange, S. Meier-ewert, H. Lehrach, and R. Shamir. An algorithm for clustering DNA fingerprints. Genomics, 66:249--256, 2000.Google ScholarCross Ref
C. M. Bishop. Variational principal components. In Proceedings Ninth International Conference on Artificial Neural Networks, ICANN'99, pages 509--514, 1999.Google ScholarCross Ref
A. Colantonio, R. Di Pietro, and A. Ocello. A cost-driven approach to role engineering. In Proceedings of the 23^rd ACM Symposium on Applied Computing, SAC '08, volume 3, pages 2129--2136, 2008. Google ScholarDigital Library
A. Colantonio, R. Di Pietro, and A. Ocello. Leveraging lattices to improve role mining. In Proceedings of the IFIP TC 11 23^rd International Information Security Conference, SEC '08, pages 333--347, 2008.Google ScholarCross Ref
A. Colantonio, R. Di Pietro, A. Ocello, and N. V. Verde. A formal framework to elicit roles with business meaning in RBAC systems. In Proceedings of the 14^th ACM Symposium on Access Control Models and Technologies, SACMAT '09, pages 85--94, 2009. Google ScholarDigital Library
A. Colantonio, R. Di Pietro, A. Ocello, and N. V. Verde. Mining stable roles in RBAC. In Proceedings of the IFIP TC 11 24^th International Information Security Conference, SEC '09, pages 259--269, 2009.Google ScholarCross Ref
A. Colantonio, R. Di Pietro, A. Ocello, and N. V. Verde. A probabilistic bound on the basic role mining problem and its applications. In Proceedings of the IFIP TC 11 24^th International Information Security Conference, SEC '09, pages 376--386, 2009.Google ScholarCross Ref
B. S. Everitt. Cluster Analysis. Edward Arnold and Halsted Press, 1993. Google ScholarDigital Library
A. Figueroa, J. Borneman, and T. Jiang. Clustering binary fingerprint vectors with missing values for DNA array data analysis. In CSB '03: Proceedings of the IEEE Computer Society Conference on Bioinformatics, pages 38--47, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2nd edition, 2006. Google ScholarDigital Library
H. Kim, G. H. Golub, and H. Park. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics, 21(2):187--198, 2005. Google ScholarDigital Library
S. Kim, E. R. Dougherty, Y. Chen, K. Sivakumar, P. Meltzer, J. M. Trent, and M. Bittner. Multivariate measurement of gene expression relationships. GENOMICS, 67:201--209, 2000.Google ScholarCross Ref
R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, New York, 1st edition, 1987. Google ScholarDigital Library
J. Liu, S. Paulsen, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets from noisy data. In Proceedings of the 5^th IEEE International Conference on Data Mining, ICDM '05, pages 721--724, 2005. Google ScholarDigital Library
H. Lu, J. Vaidya, and V. Atluri. Optimal boolean matrix decomposition: Application to role engineering. In Proceedings of the 24^th IEEE International Conference on Data Engineering, ICDE '08, pages 297--306, 2008. Google ScholarDigital Library
M. A. Mahfouz and M. A. Ismail. BIDENS: Iterative density based biclustering algorithm with application to gene expression analysis. In Proceedings of World Academy of Science, Engineering and Technology, PWASET, volume 37, pages 342--348, 2009.Google Scholar
S. Oba, M.-A. Sato, I. Takemasa, M. Monden, K.-I. Matsubara, and S. Ishii. A bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16):2088--2096, November 2003.Google ScholarCross Ref
K. Puolamäki, M. Fortelius, and H. Mannila. Seriation in paleontological data using Markov Chain Monte Carlo methods. PLoS Computational Biology, 2(2), February 2006.Google Scholar
S. Raychaudhuri, J. M. Stuart, and R. B. Altman. Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac. Symp. Biocomput, pages 452--463, 2000.Google Scholar
D. B. Rubin. Inference and missing data. Biometrika, 63(3):581--592, December 1976.Google ScholarCross Ref
D. B. Rubin. Multiple imputation for nonresponse in surveys. Wiley, 1987.Google ScholarCross Ref
J. Schafer. Analysis of Incomplete Multivariate Data. Number 72 in Monographs on Statistics and Applied Probability. Chapman Hall/CRC, 1997.Google Scholar
J. Schafer and J. Graham. Missing data: Our view of the state of the art. Psychological Methods, 2002.Google ScholarCross Ref
I. Shmulevich and W. Zhang. Binary analysis and optimization-based normalization of gene expression data. Bioinformatics, 18(4):555--565, 2002.Google ScholarCross Ref
O. G. Troyanskaya, M. Cantor, G. Sherlock, P. O. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520--525, 2001.Google ScholarCross Ref
J. Tuikkala, L. L. Elo, O. S. Nevalainen, and T. Aittokallio. Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinformatics, 9(202), April 2008.Google Scholar
M. J. Zaki and C.-J. Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462--478, April 2005. Google ScholarDigital Library

Index Terms

ABBA: adaptive bicluster-based approach to impute missing values in binary matrices
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A review on missing values for main challenges and methods
Abstract
Several recent reviews summarize common missing value analysis methods. However, none of them provide a systematic and in-depth summary of the analytical challenges and solutions for dealing with missing values. For the purpose of guiding the ...
Highlights
- Analyzed three major difficulties with missing value analysis.
- Provided a comprehensive introduction to deletion and imputation missing approaches.
- Reviewed and analyzed numerous studies and provide useful rules for processing ...
Read More
Gaussian processes for missing value imputation
Abstract
A missing value indicates that a particular attribute of an instance of a learning problem is not recorded. They are very common in many real-life datasets. In spite of this, however, most machine learning methods cannot handle missing values. ...
Highlights
- A novel approach based on chained GPs, named MGP, is introduced for imputing missing values.
- The method outputs a predictive distribution for each missing value in the dataset.
- The final model can be trained simultaneously and can ...
Read More
An effective method for classification with missing values

Classification is one of the most important tasks in machine learning with a huge number of real-life applications. In many practical classification problems, the available information for making object classification is partial or incomplete because ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing
March 2010
2712 pages
ISBN:9781605586397
DOI:10.1145/1774088
Conference Chairs:
Sung Y. Shin
South Dakota State University
,
Sascha Ossowski
University Rey Juan Carlos, Spain
,
Michael Schumacher
University of Applied Sciences Western Switzerland, Switzerland
,
Program Chairs:
Mathew J. Palakal
Indiana University Purdue University
,
Chih-Cheng Hung
Southern Polytechnic State University
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary matrix
missing values
pseudo-biclusters
Qualifiers
- research-article
Conference

Acceptance Rates
SAC '10 Paper Acceptance Rate364of1,353submissions,27%Overall Acceptance Rate1,650of6,669submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 208
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ABBA: adaptive bicluster-based approach to impute missing values in binary matrices

SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A review on missing values for main challenges and methods

Gaussian processes for missing value imputation

An effective method for classification with missing values