skip to main content
10.1145/1185448.1185540acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
Article

Automatic quality assessment of Affymetrix GeneChip data

Published: 10 March 2006 Publication History

Abstract

Computing reliable gene expression levels from microarray experiments is a sophisticated process with many potential pitfalls. Quality control is one of the most important steps in this process. We present a web based expert system for automatic quality assessment of Affymetrix GeneChip data. Our approach combines multiple quality metrics with supervised machine learning in order to identify data of low quality. Our system approximates expert opinion as represented in a knowledge base consisting of 41 microarray experiments with 352 CEL files annotated by a domain expert. GeneChips of low quality are detected automatically and can be excluded from subsequent analysis. This is especially important for large experiments or can assist the inexperienced users. Our expert system is fully implemented and integrated into a publicly available remote analysis computation engine for gene expression data.

References

[1]
Affymetrix. Affymetrix Microarray Suite Users Guide. Affymetrix, Santa Clara, CA, version 5.0 edition, 2001.
[2]
Affymetrix. GeneChip Expression Analysis Technical Manual. Affymetrix, Santa Clara, CA, rev 4.0 edition, 2003.
[3]
B. Bolstad, R. Irizarry, M. Astrand, and et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185--193, January 2003.
[4]
F. Borovecki, L. Lovrecic, J. Zhou, D. Krainc, D. Krainc, and et al. Genome-wide expression profiling of human blood reveals biomarkers for huntington's disease. Proc Natl Acad Sci U S A, 102(31):11023--8, August 2005.
[5]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, January 2001.
[6]
Y.-W. Chen and C.-J. Lin. Feature extraction, foundations and applications, chapter Combining SVMs with various feature selection strategies. Springer, to appear in 2005.
[7]
L. Cope, R. Irizarry, H. Jaffee, and et al. A benchmark for affymetrix genechip expression measures. Bioinformatics, 20(3):323--31, February 2004.
[8]
R. Gentleman, V. Carey, D. Bates, B. Bolstad, M. Dettling, and et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol., 5(10):R80, September 2004.
[9]
R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, 2005.
[10]
R. Ihaka and R. Gentleman. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5(3):299--314, 1996.
[11]
R. Irizarry, B. Bolstad, F. Collin, and et al. Summaries of affymetrix gnechip probe level data. Nucleic Acids Res., 31(4):e15, February 2003.
[12]
R. Irizarry, B. Hobbs, F. Collin, and et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4(2):249--64, April 2003.
[13]
P. Kapranov, V. Sementchenko, and T. Gingeras. Beyond expression profiling: next generation uses of high density oligonucleotide arrays. Brief Funct Genomic Proteomic., 2(1):47--56, April 2003.
[14]
E. Lips, J. Dierssen, R. van Eijk, T. van Wezel, and et al. Reliable high-throughput genotyping and loss-of-heterozygosity detection in formalin-fixed, paraffin-embedded tumors using single nucleotide polymorphism arrays. Cancer Res., 65(22):10188--101914, November 2005.
[15]
R. Lipshutz, S. Fodor, T. Gingeras, and et al. High density synthetic oligonucleotide arrays. Nature Genetics, 14(Suppl. 21):20--24, September 1999.
[16]
M. Psarros, S. Heber, M. Sick, G. Thoppae, K. Harshman, and B. Sick. Race: Remote analysis computation for gene expression data. Nucleic Acids Res., 26(33(Web Server issue)):W638-43, July 2005.
[17]
N. SAS Institute Inc., Cary. Sas 9.1.3 help and documentation, 2000--2004.
[18]
J. Shearstone, Y. Wang, A. Clement, N. Allaire, Y. C, W. DS, C. JP, and P. S. Application of functional genomic technologies in a mouse model of retinal degeneration. Genomics, 85(3):309--21, March 2005.
[19]
J. Storey and R. Tibshirani. Statistical significance for genome-wide experiments. Proc. Natl Acad. Sci. USA, 100(16):9440--5, August 2003.

Cited By

View all
  • (2009)Quality Skyline in Sensor DatabaseProceedings of the 2009 First International Workshop on Database Technology and Applications10.1109/DBTA.2009.10(578-581)Online publication date: 25-Apr-2009
  • (2007)Quality Assessment of Affymetrix GeneChip Data using the EM Algorithm and a Naive Bayes Classifier2007 IEEE 7th International Symposium on BioInformatics and BioEngineering10.1109/BIBE.2007.4375557(145-150)Online publication date: Oct-2007

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ACMSE '06: Proceedings of the 44th annual ACM Southeast Conference
March 2006
823 pages
ISBN:1595933158
DOI:10.1145/1185448
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 March 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. affymetrix GeneChip microarray experiments
  2. automated quality assessment
  3. knowledge based expert system

Qualifiers

  • Article

Conference

ACM SE06
ACM SE06: ACM Southeast Regional Conference
March 10 - 12, 2006
Florida, Melbourne

Acceptance Rates

ACMSE '06 Paper Acceptance Rate 100 of 244 submissions, 41%;
Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Quality Skyline in Sensor DatabaseProceedings of the 2009 First International Workshop on Database Technology and Applications10.1109/DBTA.2009.10(578-581)Online publication date: 25-Apr-2009
  • (2007)Quality Assessment of Affymetrix GeneChip Data using the EM Algorithm and a Naive Bayes Classifier2007 IEEE 7th International Symposium on BioInformatics and BioEngineering10.1109/BIBE.2007.4375557(145-150)Online publication date: Oct-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media