article

High Confidence Rule Mining for Microarray Analysis

Authors:

Sanjay ChawlaAuthors Info & Claims

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 4, Issue 4

Pages 611 - 623

https://doi.org/10.1109/tcbb.2007.1050

Published: 01 October 2007 Publication History

Abstract

We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high confidence rules from microarray data. MaxConf is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach -- the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.

References

[1]

T. Akutsu, S. Kuhara, O. Maruyama, and S. Miyano, “Identification of Genetic Networks by Strategic Gene Disruptions and Gene Overexpressions under a Boolean Model,” Theoretical Computer Science, vol. 298, pp. 235-251, 2003.

Digital Library

[2]

T. Akutsu, S. Miyano, and S. Kuhara, “Inferring Qualitative Relations in Genetic Networks and Metabolic Pathways,” Bioinformatics, vol. 16, no. 8, pp. 727-734, 2000.

[3]

C. Creighton and S. Hanash, “Mining Gene Expression Databases for Association Rules,” Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.

[4]

G. Cong, K.-L. Tan, A. Tung, and F. Pan, “Mining Frequent Closed Patterns in Microarray Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM), vol. 4, pp. 363-366, 2004.

Digital Library

[5]

T. Akutsu, S. Miyano, and S. Kuhara, “Identification of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model,” Proc. Pacific Symp. Biocomputing, vol. 4, pp. 17-28, 1999.

[6]

F. Pan, G. Cong, K. Tung, J. Yang, and M. Zaki, “CARPENTER: Finding Closed Patterns in Long Biological Datasets,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 637-642, 2003.

Digital Library

[7]

C. Alfarano et al., “The Biomolecular Interaction Network Database and Related Tools 2005 Update,” Nucleic Acids Research, vol. 33, pp. D418-D424, 2005.

[8]

The Gene Ontology Consortium, “The Gene Ontology (GO) Database and Informatics Resource,” Nucleic Acids Research, vol. 32, pp. D258-D261, 2004.

[9]

P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.

[10]

D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, pp. 203-209, 2002.

[11]

A. Gasch, P. Spellman, C. Kao, O. Carmel-Harel, M. Eisen, G. Storz, D. Botstein, and P. Brown, “Genomic Expression Changes in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241-4257, 2000.

[12]

D. Jiang, C. Tang, and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.

Digital Library

[13]

G. Cong, K.-L. Tan, A.K. Tung, and X. Xu, “Mining TOP-K Covering Rule Groups for Gene Expression Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 670-681, 2005.

Digital Library

[14]

R. Agrawal, T. Imielinksi, and A.N. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.

Digital Library

[15]

G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang, “FARMER: Finding Interesting Rule Groups in Microarray Datasets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 143-154, 2004.

Digital Library

[16]

M. Zaki and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining,” Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 457-473, 2002.

[17]

J. Pei, J. Han, and R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Int'l Workshop Data Mining and Knowledge Discovery (DMKD), pp. 21-30, 2000.

Digital Library

[18]

Y. Huang, H. Xiong, S. Shekhar, and J. Pei, “Mining Confident Co-Location Rules without a Support Threshold,” Proc. 18th ACM Symp. Applied Computing (SAC), pp. 407-501, 2003.

Digital Library

[19]

T. Hughes et al., “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.

[20]

S. Mnaimneh et al., “Exploration of Essential Gene Functions via Titratable Promoter Alleles,” Cell, vol. 118, pp. 31-44, 2004.

[21]

T. Beissbarth and T. Speed, “GOstat: Find Statistically Overrepresented Gene Ontologies within Gene Groups,” Bioinformatics, vol. 20, no. 9, pp. 1464-1465, 2004.

Digital Library

[22]

R. Hassett, A. Romeo, and D. Kosman, “Regulation of High Affinity Iron Uptake in the Yeast Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 273, no. 13, pp. 7628-7636, 1998.

[23]

V. Haurie, H. Boucherie, and F. Sagliocco, “The Snf1 Protein Kinase Controls the Induction of Genes of the Iron Uptake Pathway at the Diauxic Shift in Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 278, no. 46, pp. 45391-45396, 2003.

[24]

L. Martins, L. Jensen, J. Simon, G. Keller, and D. Winge, “Metalloregulation of FRE1 and FRE2 Homologs in Saccharomyces Cerevisiae,” J. Biological Chemistry, vol. 273, no. 37, pp.23716-23721, 1998.

[25]

T. McIntosh and S. Chawla, “On Discovery of Maximal Confident Rules without Support Pruning in Microarray Data,” Proc. Fifth ACM SIGKDD Workshop Data Mining in Bioinformatics (BIOKDD), pp. 37-45, 2005.

Digital Library

Cited By

Ali RGunturi VKotz AEftelioglu EShekhar SNorthrop W(2017)Discovering non-compliant window co-occurrence patternsGeoinformatica10.1007/s10707-016-0289-321:4(829-866)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1007/s10707-016-0289-3
Lee CLin W(2016)Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional dataInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.07582014:4(315-331)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1504/IJDMB.2016.075820
Zakaria WKotb YGhaleb F(2015)PMCR-MinerInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.07209113:3(225-247)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1504/IJDMB.2015.072091
Show More Cited By

Index Terms

High Confidence Rule Mining for Microarray Analysis

Recommendations

Mining fuzzy specific rare itemsets for education data

Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may ...
Efficient Mining of Intertransaction Association Rules

Most of the previous studies on mining association rules are on mining intratransaction associations, i.e., the associations among items within the same transaction where the notion of the transaction could be the items bought by the same customer, the ...
A lattice-based approach for I/O efficient association rule mining
Databases: Creation, management and utilization

Most algorithms for association rule mining are variants of the basic Apriori algorithm (Agarwal and Srikant, Fast algorithms for mining association rules in databases, in: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'...

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 4, Issue 4

October 2007

192 pages

ISSN:1545-5963

Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2007

Published in TCBB Volume 4, Issue 4

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
400
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ali RGunturi VKotz AEftelioglu EShekhar SNorthrop W(2017)Discovering non-compliant window co-occurrence patternsGeoinformatica10.1007/s10707-016-0289-321:4(829-866)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1007/s10707-016-0289-3
Lee CLin W(2016)Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional dataInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.07582014:4(315-331)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1504/IJDMB.2016.075820
Zakaria WKotb YGhaleb F(2015)PMCR-MinerInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.07209113:3(225-247)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1504/IJDMB.2015.072091
Yang CLin WLee CLeu Y(2015)Two stages weighted sampling strategy for detecting the relation between gene expression and diseaseInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.06941712:2(207-223)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1504/IJDMB.2015.069417
Allah WEl Sayed YMohamed Ghaleb F(2013)An efficient and scalable algorithm for mining maximalProceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-39712-7_27(352-366)Online publication date: 19-Jul-2013
https://dl.acm.org/doi/10.1007/978-3-642-39712-7_27
Lee CLeu YYang W(2012)Constructing gene regulatory networks from microarray data using GA/PSO with DTWApplied Soft Computing10.1016/j.asoc.2011.11.01312:3(1115-1124)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1016/j.asoc.2011.11.013
Leu YLee CTsai H(2010)A gene selection method for microarray data based on samplingProceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II10.5555/1948645.1948655(68-74)Online publication date: 10-Nov-2010
https://dl.acm.org/doi/10.5555/1948645.1948655
Pandey GAtluri GSteinbach MMyers CKumar VElder JFogelman FFlach PZaki M(2009)An association analysis approach to biclusteringProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1557019.1557095(677-686)Online publication date: 28-Jun-2009
https://dl.acm.org/doi/10.1145/1557019.1557095
Seeja KAlam MJain S(2008)Identification of Co-regulated Signature Genes in Pancreas Cancer- A Data Mining ApproachProceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues10.1007/978-3-540-87442-3_19(138-145)Online publication date: 15-Sep-2008
https://dl.acm.org/doi/10.1007/978-3-540-87442-3_19

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents