skip to main content
article

High Confidence Rule Mining for Microarray Analysis

Published: 01 October 2007 Publication History

Abstract

We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high confidence rules from microarray data. MaxConf is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach -- the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.

References

[1]
T. Akutsu, S. Kuhara, O. Maruyama, and S. Miyano, “Identification of Genetic Networks by Strategic Gene Disruptions and Gene Overexpressions under a Boolean Model,” Theoretical Computer Science, vol. 298, pp. 235-251, 2003.
[2]
T. Akutsu, S. Miyano, and S. Kuhara, “Inferring Qualitative Relations in Genetic Networks and Metabolic Pathways,” Bioinformatics, vol. 16, no. 8, pp. 727-734, 2000.
[3]
C. Creighton and S. Hanash, “Mining Gene Expression Databases for Association Rules,” Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.
[4]
G. Cong, K.-L. Tan, A. Tung, and F. Pan, “Mining Frequent Closed Patterns in Microarray Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM), vol. 4, pp. 363-366, 2004.
[5]
T. Akutsu, S. Miyano, and S. Kuhara, “Identification of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model,” Proc. Pacific Symp. Biocomputing, vol. 4, pp. 17-28, 1999.
[6]
F. Pan, G. Cong, K. Tung, J. Yang, and M. Zaki, “CARPENTER: Finding Closed Patterns in Long Biological Datasets,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 637-642, 2003.
[7]
C. Alfarano et al., “The Biomolecular Interaction Network Database and Related Tools 2005 Update,” Nucleic Acids Research, vol. 33, pp. D418-D424, 2005.
[8]
The Gene Ontology Consortium, “The Gene Ontology (GO) Database and Informatics Resource,” Nucleic Acids Research, vol. 32, pp. D258-D261, 2004.
[9]
P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
[10]
D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, pp. 203-209, 2002.
[11]
A. Gasch, P. Spellman, C. Kao, O. Carmel-Harel, M. Eisen, G. Storz, D. Botstein, and P. Brown, “Genomic Expression Changes in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241-4257, 2000.
[12]
D. Jiang, C. Tang, and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[13]
G. Cong, K.-L. Tan, A.K. Tung, and X. Xu, “Mining TOP-K Covering Rule Groups for Gene Expression Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 670-681, 2005.
[14]
R. Agrawal, T. Imielinksi, and A.N. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.
[15]
G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang, “FARMER: Finding Interesting Rule Groups in Microarray Datasets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 143-154, 2004.
[16]
M. Zaki and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining,” Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 457-473, 2002.
[17]
J. Pei, J. Han, and R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Int'l Workshop Data Mining and Knowledge Discovery (DMKD), pp. 21-30, 2000.
[18]
Y. Huang, H. Xiong, S. Shekhar, and J. Pei, “Mining Confident Co-Location Rules without a Support Threshold,” Proc. 18th ACM Symp. Applied Computing (SAC), pp. 407-501, 2003.
[19]
T. Hughes et al., “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.
[20]
S. Mnaimneh et al., “Exploration of Essential Gene Functions via Titratable Promoter Alleles,” Cell, vol. 118, pp. 31-44, 2004.
[21]
T. Beissbarth and T. Speed, “GOstat: Find Statistically Overrepresented Gene Ontologies within Gene Groups,” Bioinformatics, vol. 20, no. 9, pp. 1464-1465, 2004.
[22]
R. Hassett, A. Romeo, and D. Kosman, “Regulation of High Affinity Iron Uptake in the Yeast Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 273, no. 13, pp. 7628-7636, 1998.
[23]
V. Haurie, H. Boucherie, and F. Sagliocco, “The Snf1 Protein Kinase Controls the Induction of Genes of the Iron Uptake Pathway at the Diauxic Shift in Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 278, no. 46, pp. 45391-45396, 2003.
[24]
L. Martins, L. Jensen, J. Simon, G. Keller, and D. Winge, “Metalloregulation of FRE1 and FRE2 Homologs in Saccharomyces Cerevisiae,” J. Biological Chemistry, vol. 273, no. 37, pp.23716-23721, 1998.
[25]
T. McIntosh and S. Chawla, “On Discovery of Maximal Confident Rules without Support Pruning in Microarray Data,” Proc. Fifth ACM SIGKDD Workshop Data Mining in Bioinformatics (BIOKDD), pp. 37-45, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 4, Issue 4
October 2007
192 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 October 2007
Published in TCBB Volume 4, Issue 4

Author Tags

  1. Data mining
  2. association rules
  3. high confidence rule mining
  4. microarray analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Discovering non-compliant window co-occurrence patternsGeoinformatica10.1007/s10707-016-0289-321:4(829-866)Online publication date: 1-Oct-2017
  • (2016)Using the two-population genetic algorithm with distance-based k-nearest neighbour voting classifier for high-dimensional dataInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2016.07582014:4(315-331)Online publication date: 1-Apr-2016
  • (2015)PMCR-MinerInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.07209113:3(225-247)Online publication date: 1-Sep-2015
  • (2015)Two stages weighted sampling strategy for detecting the relation between gene expression and diseaseInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.06941712:2(207-223)Online publication date: 1-May-2015
  • (2013)An efficient and scalable algorithm for mining maximalProceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/978-3-642-39712-7_27(352-366)Online publication date: 19-Jul-2013
  • (2012)Constructing gene regulatory networks from microarray data using GA/PSO with DTWApplied Soft Computing10.1016/j.asoc.2011.11.01312:3(1115-1124)Online publication date: 1-Mar-2012
  • (2010)A gene selection method for microarray data based on samplingProceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II10.5555/1948645.1948655(68-74)Online publication date: 10-Nov-2010
  • (2009)An association analysis approach to biclusteringProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1557019.1557095(677-686)Online publication date: 28-Jun-2009
  • (2008)Identification of Co-regulated Signature Genes in Pancreas Cancer- A Data Mining ApproachProceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues10.1007/978-3-540-87442-3_19(138-145)Online publication date: 15-Sep-2008

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media