ABSTRACT
A important dimension of complex networks is embedded in the weights of its edges. Incorporating this source of information on the analysis of a network can greatly enhance our understanding of it. This is the case for gene co-expression networks, which encapsulate information about the strength of correlation between gene expression profiles. Classical unweighted gene co-expression networks use thresholding for defining connectivity, losing some of the information contained in the different connection strengths. In this paper, we propose a mining method capable of extracting information from weighted gene co-expression networks. We study groups of differently connected nodes and their importance as network motifs. We define a subgraph as a motif if the weights of edges inside the subgraph hold a significantly different distribution than what would be found in a random distribution. We use the Kolmogorov-Smirnov test to calculate the significance score of the subgraph, avoiding the time consuming generation of random networks to determine statistic significance. We apply our approach to gene co-expression networks related to three different types of cancer and also to two healthy datasets. The structure of the networks is compared using weighted motif profiles, and our results show that we are able to clearly distinguish the networks and separate them by type. We also compare the biological relevance of our weighted approach to a more classical binary motif profile, where edges are unweighted. We use shared Gene Ontology annotations on biological processes, cellular components and molecular functions. The results of gene enrichment analysis show that weighted motifs are biologically more significant than the binary motifs.
- V. Arnau, S. Mars, and I. Marín. Iterative cluster analysis of protein interaction data. Bioinformatics, 21(3):364--378, 2005. Google ScholarDigital Library
- M. I. Arnone and E. H. Davidson. The hardwiring of development: organization and function of genomic regulatory systems. Development, 124(10):1851--1864, 1997.Google ScholarCross Ref
- M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, et al. Gene ontology: tool for the unification of biology. Nature genetics, 25(1):25--29, 2000.Google ScholarCross Ref
- Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), pages 289--300, 1995.Google Scholar
- M. R. Carlson, B. Zhang, Z. Fang, P. S. Mischel, S. Horvath, and S. F. Nelson. Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC genomics, 7(1):40, 2006.Google ScholarCross Ref
- S. Choobdar, P. Ribeiro, S. Bulga, and F. Silva. Coauthorship network comparison across research fields using motifs. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012. Google ScholarDigital Library
- S. Choobdar, P. Ribeiro, and F. Silva. Motif mining in weighted networks. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on, pages 210--217. IEEE, 2012. Google ScholarDigital Library
- J. Dong and S. Horvath. Understanding network concepts in modules. BMC Systems Biology, 1(1):24, 2007.Google ScholarCross Ref
- P. S. Gargalovic, M. Imura, B. Zhang, N. M. Gharavi, M. J. Clark, J. Pagnon, W.-P. Yang, A. He, A. Truong, S. Patel, et al. Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proceedings of the National Academy of Sciences, 103(34):12741--12746, 2006.Google ScholarCross Ref
- J. Grochow and M. Kellis. Network motif discovery using subgraph enumeration and symmetry-breaking. In Research in Computational Molecular Biology, pages 92--106. Springer, 2007. Google ScholarCross Ref
- C. Helma, S. Kramer, and L. De Raedt. The molecular feature miner molfea. In Proceedings of the Beilstein-Institut Workshop. May, 2002.Google Scholar
- S. Horvath and J. Dong. Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Comput Biol, 4(8):e1000117+, Aug. 2008.Google Scholar
- S. Horvath, B. Zhang, M. Carlson, K. Lu, S. Zhu, R. Felciano, M. Laurance, W. Zhao, S. Qi, Z. Chen, et al. Analysis of oncogenic signaling networks in glioblastoma identifies aspm as a molecular target. Proceedings of the National Academy of Sciences, 103(46):17402--17407, 2006.Google ScholarCross Ref
- H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(suppl 1):i213--i221, 2005. Google ScholarDigital Library
- J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In Proceedings of the Third IEEE International Conference on Data Mining, ICDM '03, pages 549--, 2003. Google ScholarDigital Library
- C. Jiang, F. Coenen, and M. Zito. Frequent sub-graph mining on edge weighted graphs. Data Warehousing and Knowledge Discovery, pages 77--88, 2010. Google ScholarDigital Library
- S. Kullback. Information theory and statistics. Courier Dover Publications, 1968.Google Scholar
- H. Li, Y. Sun, and M. Zhan. Exploring pathways from gene co-expression to network dynamics. In Computational Systems Biology, pages 249--267. Springer, 2009.Google ScholarCross Ref
- N. K. MacLennan, J. Dong, J. E. Aten, S. Horvath, L. Rahib, L. Ornelas, K. M. Dipple, and E. R. McCabe. Weighted gene co-expression network analysis identifies biomarkers in glycerol kinase deficient mice. Molecular genetics and metabolism, 98(1):203--214, 2009.Google Scholar
- F. J. Massey Jr. The kolmogorov-smirnov test for goodness of fit. Journal of the American statistical Association, 46(253):68--78, 1951.Google Scholar
- G. L. G. Miklos and G. M. Rubin. The role of the genome project review in determining gene function: Insights from model organisms. Cell, 86:521--9, 1996.Google ScholarCross Ref
- R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, and U. Alon. Super-families of evolved and designed networks. Science, 303(5663):1538--1542, 2004.Google ScholarCross Ref
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network Motifs: Simple Building Blocks of Complex Networks. Science, 298(5594):824--827, 2002.Google Scholar
- T. Nepusz, H. Yu, and A. Paccanaro. Detecting overlapping protein complexes in protein-protein interaction networks. Nature methods, 9(5):471--472, 2012.Google ScholarCross Ref
- M. C. Oldham, S. Horvath, and D. H. Geschwind. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proceedings of the National Academy of Sciences, 103(47):17973--17978, 2006.Google ScholarCross Ref
- M. A. Pujana, J.-D. J. Han, L. M. Starita, K. N. Stevens, M. Tewari, J. S. Ahn, G. Rennert, V. Moreno, T. Kirchhoff, B. Gold, et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nature genetics, 39(11):1338--1349, 2007.Google ScholarCross Ref
- P. Ribeiro and F. Silva. G-tries: an efficient data structure for discovering network motifs. In Proceedings of the 2010 ACM Symposium on Applied Computing, pages 1559--1566, 2010. Google ScholarDigital Library
- P. Ribeiro and F. Silva. G-tries: a data structure for storing and finding subgraphs. Data Mining and Knowledge Discovery, 2013. Google ScholarDigital Library
- J. Saramaki, J.-P. Onnela, J. Kertesz, and K. Kaski. Characterizing motifs in weighted complex networks. AIP Conference Proceedings, 776(1):108--117, 2005.Google ScholarCross Ref
- J. M. Stuart, E. Segal, D. Koller, and S. K. Kim. A gene-coexpression network for global discovery of conserved genetic modules. Science, 302:249--255, 2003.Google ScholarCross Ref
- S. Wernicke. Efficient detection of network motifs. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 3(4):347--359, 2006. Google ScholarDigital Library
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM '02, pages 721--, 2002. Google ScholarDigital Library
- B. Zhang and S. Horvath. A general framework for weighted gene co-expression network analysis. Statistical applications in genetics and molecular biology, 4(1):1128, 2005.Google Scholar
- J. Zhang, K. Huang, Y. Xiang, and R. Jin. Using frequent co-expression network to identify gene clusters for breast cancer prognosis. In Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS'09. International Joint Conference on, pages 428--434. IEEE, 2009. Google ScholarDigital Library
- J. Zhang, K. Lu, Y. Xiang, M. Islam, S. Kotian, Z. Kais, C. Lee, M. Arora, H.-w. Liu, J. D. Parvin, et al. Weighted frequent gene co-expression network mining to identify genes involved in genome stability. PLoS Computational Biology, 8(8):e1002656, 2012.Google ScholarCross Ref
- W. Zhao, P. Langfelder, T. Fuller, J. Dong, A. Li, and S. Hovarth. Weighted gene coexpression network analysis: state of the art. Journal of biopharmaceutical statistics, 20(2):281--300, 2010.Google Scholar
Index Terms
Discovering weighted motifs in gene co-expression networks
Recommendations
Mining hub genes from RNA-Seq gene expression data using biclustering algorithm
Biclustering is a popularly used data mining technique for the analysis of gene expression data. Recently, multiple biclustering algorithms have been designed for finding co-expressed genes from the microarray gene expression data. Microarray data has ...
Predicting prognostic markers for glioma using gene co-expression network analysis
BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational BiologyIn this paper, we described our approach for selecting potential biomarkers based on gene co-expression network (GCN) analysis. We present an efficient GCN finding algorithm and applied it to search for predictive markers in glioblastoma using the TCGA ...
Community Based Cancer Biomarker Identification from Gene Co-expression Network
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health InformaticsFinding the biomarkers of cancers and the analysis of cancer-driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Gene expression profiling has been widely used for cancer research, and its patterns,...
Comments