skip to main content
article

Reproducibility-Optimized Test Statistic for Ranking Genes in Microarray Studies

Published: 01 July 2008 Publication History

Abstract

A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

References

[1]
T. Aittokallio, M. Kurki, O. Nevalainen, T. Nikula, A. West, and R. Lahesmaa, "Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments," J. Bioinformatics and Computational Biology, vol. 1, no. 3, pp. 541-586, Oct. 2003.
[2]
D.B. Allison, X. Cui, G.P. Page, and M. Sabripour, "Microarray Data Analysis: From Disarray to Consolidation and Consensus," Nature Rev. Genetics, vol. 7, no. 1, pp. 55-65, Jan. 2006.
[3]
P. Broberg, "Statistical Methods for Ranking Differentially Expressed Genes," Genome Biology, vol. 4, no. 6, p. R41, May 2003.
[4]
P. Broberg, "A Comparative Review of Estimates of the Proportion Unchanged Genes and the False Discovery Rate," BMC Bioinformatics, vol. 6, p. 199, Aug. 2005.
[5]
J. Comander, S. Natarajan, M.A. Gimbrone Jr., and G. Garcia-Cardena, "Improving the Statistical Detection of Regulated Genes from Microarray Data Using Intensity-Based Variance Estimation," BMC Genomics, vol. 5, no. 1, p. 17, Feb. 2004.
[6]
L.M. Cope, R.A. Irizarry, H.A. Jaffee, Z. Wu, and T.P. Speed, "A Benchmark for Affymetrix GeneChip Expression Measures," Bioinformatics, vol. 20, no. 3, pp. 323-331, Feb. 2004.
[7]
C. Genest and J.F. Plante, "On Blest's Measure of Rank Correlation," Canadian J. Statististics, vol. 31, no. 1, pp. 35-52, 2003.
[8]
R.C. Gentleman et al., "Bioconductor: Open Software Development for Computational Biology and Bioinformatics," Genome Biology, vol. 5, no. 10, p. R80, Sept. 2004.
[9]
R.A. Irizarry, B. Hobbs, F. Collin, Y.D. Beazer-Barclay, K.J. Antonellis, U. Scherf, and T.P. Speed, "Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data," Biostatistics, vol. 4, no. 2, pp. 249-264, Apr. 2003.
[10]
R.A. Irizarry, Z. Wu, and H.A. Jaffee, "Comparison of Affymetrix GeneChip Expression Measures," Bioinformatics, vol. 22, no. 7, pp. 789-794, Apr. 2006.
[11]
H. Kim, G.H. Golub, and H. Park, "Missing Value Estimation for DNA Microarray Gene Expression Data: Local Least Squares Imputation," Bioinformatics, vol. 21, no. 2, pp. 187-198, Jan. 2005.
[12]
R.D. Kim and P.J. Park, "Improving Identification of Differentially Expressed Genes in Microarray Studies Using Information from Public Databases," Genome Biology, vol. 5, no. 9, p. R70, Aug. 2004.
[13]
I. Lönnstedt and T. Speed, "Replicated Microarray Data," Statistica Sinica, vol. 12, pp. 31-46, 2002.
[14]
R. Lund, "Identification of Novel Genes Involved in the Early Differentiation of Th1 and Th2 Cells," PhD dissertation, Ann. Univ. Turkuensis D 602, 2004.
[15]
T. Mehta, M. Tanik, and D.B. Allison, "Towards Sound Epistemological Foundations of Statistical Methods for High-Dimensional Biology," Nature Genetics, vol. 36, no. 9, pp. 943-947, Sept. 2004.
[16]
S. Mukherjee, S.J. Roberts, and M.J. van der Laan, "Data-Adaptive Test Statistics for Microarray Data," Bioinformatics, vol. 21, no. 2, pp. ii108-ii114, Sept. 2005.
[17]
T. Nikula, A. West, M. Katajamaa, T. Lönnberg, R. Sara, T. Aittokallio, O.S. Nevalainen, and R. Lahesmaa, "A Human ImmunoChip cDNA Microarray Provides a Comprehensive Tool to Study Immune Responses," J. Immunological Methods, vol. 303, nos. 1-2, pp. 122-134, Aug. 2005.
[18]
P. Pavlidis, Q. Li, and W.S. Noble, "The Effect of Replication on Gene Expression Microarray Experiments," Bioinformatics, vol. 19, no. 13, pp. 1620-1627, Sept. 2003.
[19]
M.S. Pepe, G. Longton, G.L. Anderson, and M. Schummer, "Selecting Differentially Expressed Genes from Microarray Experiments," Biometrics, vol. 59, no. 1, pp. 133-142, Mar. 2003.
[20]
L.X. Qin and K.F. Kerr, "Empirical Evaluation of Data Transformations and Ranking Statistics for Microarray Analysis," Nucleic Acids Research, vol. 32, no. 18, pp. 5471-5479, Oct. 2004.
[21]
G.K. Smyth, "Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments," Statistical Applications in Genetics and Molecular Biology, vol. 3, no. 1, Feb. 2004.
[22]
V.G. Tusher, R. Tibshirani, and G. Chu, "Significance Analysis of Microarrays Applied to the Ionizing Radiation Response," Proc. Nat'l Academy of Sciences, vol. 98, no. 9, pp. 5116-5121, Apr. 2001.
[23]
R. Xu and X. Li, "A Comparison of Parametric versus Permutation Methods with Applications to General and Temporal Microarray Gene Expression Data," Bioinformatics, vol. 19, no. 10, pp. 1284- 1289, July 2003.
[24]
Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed, "Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation," Nucleic Acids Research, vol. 30, no. 4, p. e15, Feb. 2002.

Cited By

View all
  • (2023)Methodology to identify a gene expression signature by merging microarray datasetsComputers in Biology and Medicine10.1016/j.compbiomed.2023.106867159:COnline publication date: 1-Jun-2023
  • (2019)Maximal information coefficient applied to differentially expressed genes identificationTechnology and Health Care10.3233/THC-19902427:S1(249-262)Online publication date: 1-Jan-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 3
July 2008
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2008
Published in TCBB Volume 5, Issue 3

Author Tags

  1. Microarray
  2. bootstrap
  3. differential expression
  4. gene expression
  5. gene ranking
  6. reproducibility

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Methodology to identify a gene expression signature by merging microarray datasetsComputers in Biology and Medicine10.1016/j.compbiomed.2023.106867159:COnline publication date: 1-Jun-2023
  • (2019)Maximal information coefficient applied to differentially expressed genes identificationTechnology and Health Care10.3233/THC-19902427:S1(249-262)Online publication date: 1-Jan-2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media