ABSTRACT
Advances in genomic technologies have allowed vast amounts of gene expression data to be collected. Protein functional annotation and biological module discovery that are based on a single gene expression data suffers from spurious coexpression. Recent work have focused on integrating multiple independent gene expression data sets. In this paper, we propose a two-step approach for mining maximally frequent collection of highly connected modules from coexpression graphs. We first mine maximal frequent edge-sets and then extract highly connected subgraphs from the edge-induced subgraphs. Experimental results on the collection of modules mined from 52 Human gene expression data sets show that coexpression links that occur together in a significant number of experiments have a modular topological structure. Moreover, GO enrichment analysis shows that the proposed approach discovers biologically significant frequent collections of modules.
- Gary D. Bader and Christopher W. V. Hogu. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4(2), 2003.Google Scholar
- Imre Derenyi, Gergely Palla, and Tamas Vicsek. Clique percolation in random networks. Phys. Rev. Lett., 94(16):160202, 2005.Google Scholar
- Audrey P Gasch and Michael B Eisen. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology, 3(11):research0059.1--0059.22, 2002.Google Scholar
- Karam Gouda and Mohammed J. Zaki. GenMax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11 (3):223--242, Nov 2005. Google ScholarDigital Library
- Haiyan Hu, Xifeng Yan, Yu Huang, and Xianghong Jasmine Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21 Suppl 1:i213--i221, 2005. Google ScholarDigital Library
- Yu Huang, Haifeng Li, Haiyan Hu, Xifeng Yan, Michael S. Waterman, Haiyan Huang, and Xianghong Jasmine Zhou. Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics, 23(13):i222--i229, 2007. Google ScholarDigital Library
- Daxin Jiang and Jian Pei. Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. Data, 2(4):16:1--16:42, jan 2009. Google ScholarDigital Library
- Mehmet Koyuturk, Ananth Grama, and Wojciech Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20(Suppl 1): i200--i207, 2004. Google ScholarDigital Library
- Homin K. Lee, Amy K. Hsu, Jon Sajdak, Jie Qin, and Paul Pavlidis. Coexpression analysis of human genes across many microarray data sets. Genome Res., 14(6):1085--1094, 2004.Google Scholar
- Pierre-Nicolas Mougel, Mark Plantevit, Christophe Rigotti, Olivier Gandrillon, and Jean-Francois Boulicaut. Constraint-based mining of sets of cliques sharing vertex properties. In In: Workshop on Analysis of Complex NEtworks (ACNE 2010) co-located with ECML/PKDD 2010, 2010.Google Scholar
- Jian Pei, Daxin Jiang, and Aidong Zhang. On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 228--238, 2005. Google ScholarDigital Library
- Ahsanur Rahman, Christopher L Poirel, David J Badger, and TM Murali. Reverse engineering molecular hypergraphs. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pages 68--75. ACM, 2012. Google ScholarDigital Library
- Xifeng Yan, Xianghong Jasmine Zhou, and Jiawei Han. Mining closed relational graphs with connectivity constraints. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 324--333, 2005. Google ScholarDigital Library
- Barry R. Zeeberg, Weimin Feng, Geoffrey Wang, May D. Wang, Anthony T. Fojo, Margot Sunshine, Sudarshan Narasimhan, David W. Kane, William C. Reinhold, Samir Lababidi, and Kimberly. Gominer: A resource for biological interpretation of genomic and proteomic data. Genome Biology, 4(4):R28, 2003.Google ScholarCross Ref
Recommendations
A survey of disease connections for CD4+ T cell master genes and their directly linked genes
HighlightsCD4+ T cell subtype master genes and their connected genes are more likely to be associated with a disease or a phenotype.Genes connected to the CD4+ T cell subtype master genes are more likely to be transcription factors.CD4+ T cell subtype ...
Bipartite network analysis reveals metabolic gene expression profiles that are highly associated with the clinical outcomes of acute myeloid leukemia
Display Omitted Metabolic genes are as important prognostic biomarkers as oncogenes.We found that significant differences exist in metabolic processes of AML patients.We identified 62 metabolic genes that highly associated with the prognosis of ...
Identification and analysis of the regulatory network of Myc and microRNAs from high-throughput experimental data
As a transcription factor, c-Myc exerts significant influence in cancer development by regulating transcription of a large number of target genes including microRNAs. However, details of regulatory networks composed of Myc, microRNAs, and microRNA ...
Comments