research-article

MFMS: maximal frequent module set mining from multiple human gene expression data sets

Authors:
Saeed Salem

North Dakota State University, Fargo, ND

North Dakota State University, Fargo, ND
View Profile

,
Cagri Ozcaglar

Bank of America Merrill Lynch

Bank of America Merrill Lynch
View Profile

BioKDD '13: Proceedings of the 12th International Workshop on Data Mining in BioinformaticsAugust 2013Pages 51–57https://doi.org/10.1145/2500863.2500869

Published:11 August 2013Publication History

BioKDD '13: Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

Pages 51–57

ABSTRACT

Advances in genomic technologies have allowed vast amounts of gene expression data to be collected. Protein functional annotation and biological module discovery that are based on a single gene expression data suffers from spurious coexpression. Recent work have focused on integrating multiple independent gene expression data sets. In this paper, we propose a two-step approach for mining maximally frequent collection of highly connected modules from coexpression graphs. We first mine maximal frequent edge-sets and then extract highly connected subgraphs from the edge-induced subgraphs. Experimental results on the collection of modules mined from 52 Human gene expression data sets show that coexpression links that occur together in a significant number of experiments have a modular topological structure. Moreover, GO enrichment analysis shows that the proposed approach discovers biologically significant frequent collections of modules.

References

Gary D. Bader and Christopher W. V. Hogu. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4(2), 2003.Google Scholar
Imre Derenyi, Gergely Palla, and Tamas Vicsek. Clique percolation in random networks. Phys. Rev. Lett., 94(16):160202, 2005.Google Scholar
Audrey P Gasch and Michael B Eisen. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology, 3(11):research0059.1--0059.22, 2002.Google Scholar
Karam Gouda and Mohammed J. Zaki. GenMax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11 (3):223--242, Nov 2005. Google ScholarDigital Library
Haiyan Hu, Xifeng Yan, Yu Huang, and Xianghong Jasmine Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21 Suppl 1:i213--i221, 2005. Google ScholarDigital Library
Yu Huang, Haifeng Li, Haiyan Hu, Xifeng Yan, Michael S. Waterman, Haiyan Huang, and Xianghong Jasmine Zhou. Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics, 23(13):i222--i229, 2007. Google ScholarDigital Library
Daxin Jiang and Jian Pei. Mining frequent cross-graph quasi-cliques. ACM Trans. Knowl. Discov. Data, 2(4):16:1--16:42, jan 2009. Google ScholarDigital Library
Mehmet Koyuturk, Ananth Grama, and Wojciech Szpankowski. An efficient algorithm for detecting frequent subgraphs in biological networks. Bioinformatics, 20(Suppl 1): i200--i207, 2004. Google ScholarDigital Library
Homin K. Lee, Amy K. Hsu, Jon Sajdak, Jie Qin, and Paul Pavlidis. Coexpression analysis of human genes across many microarray data sets. Genome Res., 14(6):1085--1094, 2004.Google Scholar
Pierre-Nicolas Mougel, Mark Plantevit, Christophe Rigotti, Olivier Gandrillon, and Jean-Francois Boulicaut. Constraint-based mining of sets of cliques sharing vertex properties. In In: Workshop on Analysis of Complex NEtworks (ACNE 2010) co-located with ECML/PKDD 2010, 2010.Google Scholar
Jian Pei, Daxin Jiang, and Aidong Zhang. On mining cross-graph quasi-cliques. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 228--238, 2005. Google ScholarDigital Library
Ahsanur Rahman, Christopher L Poirel, David J Badger, and TM Murali. Reverse engineering molecular hypergraphs. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pages 68--75. ACM, 2012. Google ScholarDigital Library
Xifeng Yan, Xianghong Jasmine Zhou, and Jiawei Han. Mining closed relational graphs with connectivity constraints. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD '05, pages 324--333, 2005. Google ScholarDigital Library
Barry R. Zeeberg, Weimin Feng, Geoffrey Wang, May D. Wang, Anthony T. Fojo, Margot Sunshine, Sudarshan Narasimhan, David W. Kane, William C. Reinhold, Samir Lababidi, and Kimberly. Gominer: A resource for biological interpretation of genomic and proteomic data. Genome Biology, 4(4):R28, 2003.Google ScholarCross Ref

Recommendations

A survey of disease connections for CD4+ T cell master genes and their directly linked genes

HighlightsCD4+ T cell subtype master genes and their connected genes are more likely to be associated with a disease or a phenotype.Genes connected to the CD4+ T cell subtype master genes are more likely to be transcription factors.CD4+ T cell subtype ...
Read More
Bipartite network analysis reveals metabolic gene expression profiles that are highly associated with the clinical outcomes of acute myeloid leukemia

Display Omitted Metabolic genes are as important prognostic biomarkers as oncogenes.We found that significant differences exist in metabolic processes of AML patients.We identified 62 metabolic genes that highly associated with the prognosis of ...
Read More
Identification and analysis of the regulatory network of Myc and microRNAs from high-throughput experimental data

As a transcription factor, c-Myc exerts significant influence in cancer development by regulating transcription of a large number of target genes including microRNAs. However, details of regulatory networks composed of Myc, microRNAs, and microRNA ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BioKDD '13: Proceedings of the 12th International Workshop on Data Mining in Bioinformatics
August 2013
64 pages
ISBN:9781450323277
DOI:10.1145/2500863
General Chairs:
Jake Chen
Indiana University School of Informatics, Purdue University School of Science Indiana Center for Systems Biology and Personalized Medicine, Indianapolis, IN
,
Mohammed Zaki
Rensselaer Polytechnic Institute, Troy, NY
,
Program Chairs:
Gaurav Pandey
Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY
,
Huzefa Rangwala
George Mason University, Fairfax, VA
,
George Karypis
University of Minnesota, Minneapolis, MN
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
BioKDD '13 Paper Acceptance Rate7of16submissions,44%Overall Acceptance Rate7of16submissions,44%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 161
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MFMS: maximal frequent module set mining from multiple human gene expression data sets

BioKDD '13: Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

ABSTRACT

References

Cited By

Recommendations

A survey of disease connections for CD4+ T cell master genes and their directly linked genes

Bipartite network analysis reveals metabolic gene expression profiles that are highly associated with the clinical outcomes of acute myeloid leukemia

Identification and analysis of the regulatory network of Myc and microRNAs from high-throughput experimental data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MFMS: maximal frequent module set mining from multiple human gene expression data sets

BioKDD '13: Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

ABSTRACT

References

Cited By

Recommendations

A survey of disease connections for CD4+ T cell master genes and their directly linked genes

Bipartite network analysis reveals metabolic gene expression profiles that are highly associated with the clinical outcomes of acute myeloid leukemia

Identification and analysis of the regulatory network of Myc and microRNAs from high-throughput experimental data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media