skip to main content
article

An Information Theoretic Exploratory Method for Learning Patterns of Conditional Gene Coexpression from Microarray Data

Published: 01 January 2008 Publication History

Abstract

In this article, we introduce an exploratory framework for learning patterns of conditional co-expression in gene expression data. The main idea behind the proposed approach consists of estimating how the information content shared by a set of M nodes in a network (where each node is associated to an expression profile) varies upon conditioning on a set of L conditioning variables (in the simplest case represented by a separate set of expression profiles). The method is non-parametric and it is based on the concept of statistical co-information, which, unlike conventional correlation based techniques, is not restricted in scope to linear conditional dependency patterns. Moreover, such conditional co-expression relationships can potentially indicate regulatory interactions that do not manifest themselves when only pair-wise relationships are considered. A moment based approximation of the co-information measure is derived that efficiently gets around the problem of estimating high-dimensional multi-variate probability density functions from the data, a task usually not viable due to the intrinsic sample size limitations that characterize expression level measurements. By applying the proposed exploratory method, we analyzed a whole genome microarray assay of the eukaryote Saccharomices cerevisiae and were able to learn statistically significant patterns of conditional co-expression. A selection of such interactions that carry a meaningful biological interpretation are discussed.

References

[1]
A.J. Bell, "The Co-Information Lattice," Proc. Fourth Int'l Symp. Independent Component Analysis and Blind Signal Separation (ICA '03), pp. 921-926, Apr. 2003.
[2]
T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley & Sons, 1991.
[3]
S. Draghici, "Statistical Intelligence: Effective Analysis of High-Density Microarray Data," Drug Discovery Today, vol. 7, no. 11, pp. S55-S63, June 2002.
[4]
S. Draghici, Data Analysis Tools for DNA Microarrays. Chapman and Hall/CRC, 2003.
[5]
B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, 1993.
[6]
M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, "Cluster Analysis and Display of Genome-Wide Expression Patterns," Proc. Nat'l Academy of Sciences (PNAS '98), vol. 95, pp. 14863-14868, Dec. 1998.
[7]
A.P. Gash et al., "Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes," Molecular Biology of the Cell, vol. 11, pp. 4241-4257, 2000.
[8]
N. Friedman, I. Nachman, and D. Peér, "Learning Bayesian Network Structure from Massive Datasets: The 'Sparse Candidate' Algorithm," Proc. 15th Conf. Uncertainty in Artificial Intelligence (UAI '99), K.B. Laskey and H. Prade, eds., pp. 206-215, 1999.
[9]
T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning. Springer, 2001.
[10]
M.C. Jones, "The Projection Pursuit Algorithm for Exploratory Data Analysis," PhD dissertation, Univ. of Bath, School of Math., 1983.
[11]
K.C. Kao, Y.-L. Yang, R. Boscolo, C. Sabatti, V.P. Roychowdhury, and J.C. Liao, "Determination of Multiple Transcription Regulator Activities in Escherichia coli Using Network Component Analysis," Proc. Nat'l Academy of Sciences (PNAS '04), vol. 101, no. 2, pp. 641- 646, 2004.
[12]
M.G. Kendall and A. Stuart, The Advanced Theory of Statistics. Volume 1: Distribution Theory, fourth ed. Griffin, 1977.
[13]
T. Kohonen, "Self-Organizing Formation of Topologically Correct Feature Maps," Biological Cybernetics, vol. 43, no. 1, pp. 59-69, 1982.
[14]
K.-C. Li, "Genome-Wide Coexpression Dynamics: Theory and Application," Proc. Nat'l Academy Sciences (PNAS '02), vol. 99, no. 26, pp. 16875-16880, Dec. 2002.
[15]
J.C. Liao, R. Boscolo, Y.-L. Yang, L.M. Tran, C. Sabatti, and V.P. Roychowdhury, "Network-Enabled Reconstruction of Regulatory Signals in Biological Systems," Proc. Nat'l Academy of Sciences (PNAS '03), vol. 100, no. 26, pp. 15522-15527, 2003.
[16]
B.W. Silverman, Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1985.
[17]
D.L. Wallace, "Asymptotic Approximations to Distributions," Annals of Math. Statistics, vol. 29, pp. 635-654, 1958.

Cited By

View all
  • (2011)A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide MeasurementsIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2010.1028:5(1153-1169)Online publication date: 1-Sep-2011

Index Terms

  1. An Information Theoretic Exploratory Method for Learning Patterns of Conditional Gene Coexpression from Microarray Data

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
            IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 1
            January 2008
            159 pages

            Publisher

            IEEE Computer Society Press

            Washington, DC, United States

            Publication History

            Published: 01 January 2008
            Published in TCBB Volume 5, Issue 1

            Author Tags

            1. Co-information
            2. Entropy
            3. Gene expression data
            4. Information theory
            5. Statistical analysis

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)4
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 01 Mar 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2011)A Generalized Multivariate Approach to Pattern Discovery from Replicated and Incomplete Genome-Wide MeasurementsIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2010.1028:5(1153-1169)Online publication date: 1-Sep-2011

            View Options

            Login options

            Full Access

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media