skip to main content
article

An Evaluation of Information Content as a Metric for the Inference of Putative Conserved Noncoding Regions in DNA Sequences Using a Genetic Algorithms Approach

Published: 01 January 2008 Publication History

Abstract

In previous work, we presented GAMI [1], an approach to motif inference that uses a genetic algorithms search. GAMI is designed specifically to find putative conserved regulatory motifs in noncoding regions of divergent species, and is designed to allow for analysis of long nucleotide sequences. In this work, we compare GAMI's performance when run with its original fitness function (a simple count of the number of matches) and when run with information content, as well as several variations on these metrics. Results indicate that information content does not identify highly conserved regions, and thus is not the appropriate metric for this task, while variations on information content as well as the original metric succeed in identifying putative conserved regions.

References

[1]
C.B. Congdon, C.W. Fizer, N.W. Smith, H.R. Gaskins, J. Aman, G.M. Nava, and C. Mattingly, "Preliminary Results for GAMI: A Genetic Algorithms Approach to Motif Inference," Proc. 2005 IEEE Symp. Computational Intelligence in Bioinformatics and Computational Biology (CIBCB '05), pp. 97-104, 2005.
[2]
V. Matys, E. Fricke, R. Geffers, E. Gossling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Morgoulis, D.U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Munch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and E. Wingender, "Transfac: Transcriptional Regulation, from Patterns to Profiles," Nucleic Acids Research, vol. 31, pp. 374-378, 2003.
[3]
L.A. Pennacchio and E.M. Rubin, "Comparative Genomic Tools and Databases: Providing Insights into the Human Genome," J. Clinical Investigation, vol. 111, pp. 1099-1106, 2003.
[4]
J.W. Thomas and J.W. Touchman, "Vertebrate Genome Sequencing: Building a Backbone for Comparative Genomics," Trends in Genetics, vol. 18, pp. 104-108, 2002.
[5]
J.W. Thomas, J.W. Touchman, R.W. Blakesley, G.G. Bouffard, S.M. Beckstrom-Sternberg, and E.H. Margulies, "Comparative Analyses of Multi-Species Sequences from Targeted Genomic Regions," Nature, vol. 424, pp. 788-793, 2003.
[6]
I. Dubchak, M. Brudno, G.G. Loots, L. Pachter, C. Mayor, E.M. Rubin, and K.A. Frazer, "Active Conservation of Non-Coding Sequences Revealed by Three-Way Species Comparisons," Genome Research, vol. 10, pp. 1304-1306, 2000.
[7]
S. Santini, J.L. Boore, and A. Meyer, "Evolutionary Conservation of Regulatory Elements in Vertebrate Hox Gene Clusters," Genome Research, vol. 13, pp. 1111-1122, 2003.
[8]
S. Aparicio, A. Morrison, A. Gould, J. Gilthorpe, C. Chaudhuri, P. Rigby, R. Krumlauf, and S. Brenner, "Detecting Conserved Regulatory Elements with the Model Genome of the Japanese Puffer Fish, Fugu Rubripes," Proc. Nat'l Academy of Sciences, vol. 92, pp. 1684-1688, 1995.
[9]
J.D. Hughes, P.W. Estep, S. Tavazoie, and G.M. Church, "Computational Identification of CIS-Regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces cerevisiae," J. Molecular Biology, vol. 296, pp. 1205-1214, 2000.
[10]
G.G. Loots, R.M. Locksley, C.M. Blankespoor, Z.E. Wang, W. Miller, E.M. Rubin, and K.A. Frazer, "Identification of a Coordinate Regulator of Interleukins 4, 13 and 5 by Cross-Species Sequence Comparisons," Science, vol. 288, pp. 136-140, 2000.
[11]
M. Tompa, N. Li, T. Bailey, G. Church, B. De Moor, E. Eskin, A. Favorov, M. Frith, Y. Fu, J. Kent, V. Makeev, A. Mironov, W. Noble, G. Pavesi, G. Pesole, and M. Ry, "An Assessment of Computational Tools for the Discovery of Transcription Factor Binding Sites," Nature Biotechnology, vol. 23, no. 1, pp. 137-144, Jan. 2005.
[12]
K. Cartharius, K. Frech, K. Grote, B. Klocke, M. Haltmeier, A. Klingenhoff, M. Frisch, M. Bayerlein, and T. Werner, "Matinspector and Beyond: Promoter Analysis Based on Transcription Factor Binding Sites," Bioinformatics, vol. 21, no. 13, pp. 2933-2942, 2005.
[13]
M.A. Lones and A.M. Tyrrell, "The Evolutionary Computation Approach to Motif Discovery in Biological Sequences," Proc. ACM SIGEVO Genetic and Evolutionary Computation Conf. (GECCO '05); Workshop Biological Applications of Genetic and Evolutionary Computation , 2005.
[14]
D. Corne, A. Meade, and R. Sibly, "Evolving Core Promoter Signal Motifs," Proc. 2001 Congress on Evolutionary Computation (CEC '01), pp. 1162-1169, 2001.
[15]
A. Meade, D. Corne, and R. Sibly, "Discovering Patterns in Microsatellite Flanks with Evolutionary Computation by Evolving Discriminatory DNA Motifs," Proc. 2002 Congress Evolutionary Computation (CEC '02), pp. 1-6, 2002.
[16]
J.D. Thompson, D.G. Higgins, and T.J. Gibson, "CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice," Nucleic Acids Research, vol. 22, no. 22, 1994.
[17]
S. Schwartz, Z. Zhang, K.A. Frazer, A. Smit, C. Riemer, J. Bouck, R. Gibbs, R. Hardison, and W. Miller, "PipMaker--A Web Server for Aligning Two Genomic DNA Sequences," Genome Research, vol. 10, no. 4, pp. 577-586, Apr. 2000.
[18]
T.L. Bailey and C. Elkan, "Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers," Proc. Second Int'l Conf. Intelligent Systems for Molecular Biology, pp. 28-36, 1994.
[19]
W. Thompson, E.C. Rouchka, and C.E. Lawrence, "Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites," Nucleic Acids Research, vol. 31, no. 13, pp. 3580-3585, 2003.
[20]
G.B. Fogel, D.G. Weekes, G. Varga, H.B. Harlow, J.E. Onyia, and C. Su, "Discovery of Sequence Motifs Related to Co-Expression of Genes Using Evolutionary Computation," Nucleic Acids Research, vol. 32, no. 13, pp. 3826-3835, 2004.
[21]
C.F. Higgins, "ABC Transporters: From Microorganisms to Man," Ann. Rev. of Cell Biology, vol. 8, pp. 67-113, 1992.
[22]
M. Dean, A. Rzhetsky, and R. Allikmets, "The Human ATP-Binding Cassette (ABC) Transporter Superfamily," Genome Research , vol. 11, pp. 1156-1166, 2001.
[23]
E.M. Leslie, R.G. Deeley, and S.P. Cole, "Toxicological Relevance of the Multidrug Resistance Protein 1, Mrp1 (ABCC1) and Related Transporters," Toxicology, vol. 167, pp. 3-23, 2001.
[24]
J.D. Hayes, J.U. Flanagan, and I.R. Jowsey, "Glutathione Transferases," Ann. Rev. of Pharmacology and Toxicology, vol. 45, pp. 51- 88, 2005.
[25]
C.C. McIlwain, D.M. Townsend, and K.D. Tew, "Glutathione S-Transferase Polymorphisms: Cancer Incidence and Therapy," Oncogene, vol. 25, no. 11, pp. 1639-1648, 2006.
[26]
A. Woolfe et al., "Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development," PLoS Biology, vol. 3, no. e7, pp. 116-130, 2005.
[27]
D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.
[28]
L. Davis, Handbook of Genetic Algorithms. Van Nostrand Reinhold, 1991.
[29]
M. Mitchell, An Introduction to Genetic Algorithms. MIT Press, 1996.
[30]
V. Curwen, E. Eyras, T.D. Andrews, L. Clarke, E. Mongin, S.M. Searle, and M. Clamp, "The Ensembl Automatic Gene Annotation System," Genome Research, vol. 14, pp. 942-950, 2004.
[31]
K.D. Pruitt, T. Tatusov, and D.R. Maglott, "NCBI Reference Sequence (REFSEQ): A Curated Non-Redundant Sequence Data-base of Genomes, Transcripts and Proteins," Nucleic Acids Research, vol. 33, pp. D501-D504, 2005.
[32]
A. Marchler-Bauer and S.H. Bryant, "CD-Search: Protein Domain Annotations on the Fly," Nucleic Acids Research, vol. 32, pp. W327- W331, 2004.
[33]
T.F. Smith and M.S. Waterman, "Identification of Common Molecular Subsequences," J. Molecular Biology, vol. 147, pp. 195- 197, 1981.
[34]
M. Clamp, J. Cuff, S.M. Searle, and G.J. Barton, "The Jalview Java Alignment Editor," Bioinformatics, vol. 20, pp. 426-427, 2004.
[35]
W.R. Pearson, "Searching Protein Sequence Libraries: Comparison of the Sensitivity and Selectivity of the Smith-Waterman and FASTA Algorithms," Genomics, vol. 11, pp. 635-650, 1991.
[36]
M. Brudno et al., "Lagan and Multi-Lagan: Efficient Tools for Large Scale Multiple Alignment of Genomic DNA," Genome Research, vol. 13, pp. 721-731, 2003.
[37]
A.F.A. Smit and P. Green, "Repeatmasker Open-3.0," http:// www.repeatmasker.org, 1996-2004.
[38]
J.J. Grefenstette, "A User's Guide to GENESIS," technical report, Navy Center for Applied Research in AI, 1987, source code updated 1990, http://www.cs.cmu.edu/afs/cs/project/ ai-repository/ai/areas/genetic/ga/systems/genesis/.
[39]
G.Z. Hertz and G.D. Stormo, "Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences," Bioinformatics, vol. 15, no. 7, pp. 563-577, 1999.
[40]
G.E. Crooks, G. Hon, J.M. Chandonia, and S.E. Brenner, "WebLogo: A Sequence Logo Generator," Genome Research, vol. 14, pp. 1188-1190, 2004.
[41]
N. Mouchel, S.A. Henstra, V.A. McCarthy, S.H. Williams, M. Phylactides, and A. Harris, "Hnf1alpha Is Involved in Tissue-Specific Regulation of CFTR Gene Expression," Biochemical J., vol. 378, no. Pt 3, pp. 909-918, 15 Mar. 2004.
[42]
M. Levinson-Dushnik and N. Benvenisty, "Involvement of Hepatocyte Nuclear Factor 3 in Endoderm Differentiation of Embryonic Stem Cells," Molecular and Cellular Biology, vol. 17, no. 7, pp. 3817-3822, July 1997.
[43]
X. Hu, J.R. Roberts, P.L. Apopa, Y.W. Kan, and Q. Ma, "Accelerated Ovarian Failure Induced By 4-Vinyl Cyclohexene Diepoxide in Nrf2 Null Mice," Molecular and Cellular Biology, vol. 26, no. 3, pp. 940-954, 2006.
[44]
V. Bombail, K. Taylor, G.G. Gibson, and N. Plant, "Role of Sp1, C/ EBP Alpha, HNF3 and PXR in the Basal- and Xenobiotic-Mediated Regulation of the CYP3A4 Gene," Drug Metabolism and Disposition, vol. 32, no. 5, pp. 525-535, 2004.

Cited By

View all
  • (2014)A workflow for the computational identification of candidate regulatory elements in noncoding DNAProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2660815(643-644)Online publication date: 20-Sep-2014
  • (2013)Initial Results In Using de Novo Motif Inference to Detect Cis-Regulatory ModulesProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506689(687-688)Online publication date: 22-Sep-2013
  • (2012)Trade-offs using GAMID for the inference of DNA motifs that are represented in only a subset of sequences of interestProceedings of the 14th annual conference companion on Genetic and evolutionary computation10.1145/2330784.2330874(563-568)Online publication date: 7-Jul-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 5, Issue 1
January 2008
159 pages

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 January 2008
Published in TCBB Volume 5, Issue 1

Author Tags

  1. Biology and genetics
  2. Evolutionary computing and genetic algorithms

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)A workflow for the computational identification of candidate regulatory elements in noncoding DNAProceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics10.1145/2649387.2660815(643-644)Online publication date: 20-Sep-2014
  • (2013)Initial Results In Using de Novo Motif Inference to Detect Cis-Regulatory ModulesProceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics10.1145/2506583.2506689(687-688)Online publication date: 22-Sep-2013
  • (2012)Trade-offs using GAMID for the inference of DNA motifs that are represented in only a subset of sequences of interestProceedings of the 14th annual conference companion on Genetic and evolutionary computation10.1145/2330784.2330874(563-568)Online publication date: 7-Jul-2012
  • (2012)GAMIVProceedings of the 14th annual conference companion on Genetic and evolutionary computation10.1145/2330784.2330872(555-558)Online publication date: 7-Jul-2012
  • (2010)Automated extraction of extended structured motifs using multi-objective genetic algorithmExpert Systems with Applications: An International Journal10.1016/j.eswa.2009.06.10137:3(2421-2426)Online publication date: 1-Mar-2010
  • (2009)GAPKProceedings of the Eleventh conference on Congress on Evolutionary Computation10.5555/1689599.1689636(277-284)Online publication date: 18-May-2009
  • (2009)Modeling evolutionary fitness for DNA motif discoveryProceedings of the 11th Annual conference on Genetic and evolutionary computation10.1145/1569901.1569933(225-232)Online publication date: 8-Jul-2009
  • (2008)An improved genetic algorithm for DNA motif discovery with public domain informationProceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I10.5555/1813488.1813556(521-528)Online publication date: 25-Nov-2008
  • (2008)It's not junk!ACM SIGEVOlution10.1145/1562108.15621103:3(5-16)Online publication date: 1-Aug-2008

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media