Article

MDGA: motif discovery using a genetic algorithm

Authors:
Dongsheng Che

University of Georgia, Athens, GA

University of Georgia, Athens, GA
View Profile

,
Yinglei Song

University of Georgia, Athens, GA

University of Georgia, Athens, GA
View Profile

,
Khaled Rasheed

University of Georgia, Athens, GA

University of Georgia, Athens, GA
View Profile

GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computationJune 2005Pages 447–452https://doi.org/10.1145/1068009.1068080

Published:25 June 2005Publication History

GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation

Pages 447–452

ABSTRACT

Computationally identifying transcription factor binding sites in the promoter regions of genes is an important problem in computational biology and has been under intensive research for a decade. To predict the binding site locations efficiently, many algorithms that incorporate either approximate or heuristic techniques have been developed. However, the prediction accuracy is not satisfactory and binding site prediction thus remains a challenging problem. In this paper, we develop an approach that can be used to predict binding site motifs using a genetic algorithm. Based on the generic framework of a genetic algorithm, the approach explores the search space of all possible starting locations of the binding site motifs in different target sequences with a population that undergoes evolution. Individuals in the population compete to participate in the crossovers and mutations occur with a certain probability. Initial experiments demonstrated that our approach could achieve high prediction accuracy in a small amount of computation time. A promising advantage of our approach is the fact that the computation time does not explicitly depend on the length of target sequences and hence may not increase significantly when the target sequences become very long.

References

Bailey, T.L. and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Stanford, CA, AAAI Press, Bethesda, MD, 1994, 28--36.Google Scholar
Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator, Genome Research, 14, 6 (June 2004), 1188--1190.Google ScholarCross Ref
Eiben, A.E. and Smith, J.E. Introduction to Evolutionary Computing. Springer-Verlag, New York. 2003. Google ScholarDigital Library
Galas, D.J. and Schmitz, A. A DNA footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res., 5, 9 (Sep. 1978), 3157--3170.Google ScholarCross Ref
Garner, M. M., and A. Revzin. A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res. 9, 13 (July, 1981), 3047--3060.Google ScholarCross Ref
Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N., Macisaac, K.D., Danford, T.D., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, P.A., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E. and Young, R.A. Transcriptional Regulatory Code of a Eukaryotic Genome. Nature, 431, 7004 (Sep. 2004), 99--104.Google ScholarCross Ref
Hertz, G.Z. and Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 7 (July, 1999), 563--577.Google ScholarCross Ref
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. and Wooton, J.C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 8 (Oct. 1993), 208--214.Google ScholarCross Ref
Liu, F.F.M., Tsai, J.J.P., Chen, R.M., Chen, S.N. and Shih, S.H. FMGA: finding motifs by genetic algorithm. IEEE Fourth Symposium on Bioinformatics and Bioengineering (BIBE 2004), May 2004, 459--466. Google ScholarDigital Library
Liu, J.S., Neuwald, A.F. and Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc., 90, 432 (Nov. 1995), 1156--1170.Google ScholarCross Ref
Liu, X., Brutlag, D.L. and Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput., 6, 2001, 127--138.Google Scholar
Notredame, C. and Higgins, D.G. SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 8 (Apr. 1996), 1515--1524Google ScholarCross Ref
Roth, F.R., Hughes, J. D., Estep, P. E. and Church G. M. Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-Genome mRNA quantitation. Nature Biotechnology 16, 10 (Oct. 1998), 939--45.Google ScholarCross Ref
Stormo, G.D. Computer methods for analyzing sequence recognition of nucleic acids. Annu. Rev. BioChem. 17, 1988, 241--263.Google Scholar
Stormo,G.D. and Hartzell, G.W. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA, 86, 4, (Feb. 1989), 1183--1187.Google ScholarCross Ref

Index Terms

MDGA: motif discovery using a genetic algorithm
1. Computing methodologies
  1. Artificial intelligence
    1. Control methods
    2. Search methodologies
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Dynamic programming

Recommendations

Finding motifs for insufficient number of sequences with strong binding to transcription facto
RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

Finding motifs is an important problem in computational biology. Our paper makes two major contributions to this problem. Firstly, we better characterize the types of problem instances that cannot be solved by most existing methods of finding motifs. ...
Read More
Assessment of clustering algorithms for unsupervised transcription factor binding site discovery

Identification of transcription factor binding sites is a key task to understand gene regulation mechanism to discover gene networks and functions. Clustering approach is proved to be useful when finding such patterns residing in promoter regions of co-...
Read More
Algorithms for phylogenetic footprinting
RECOMB '01: Proceedings of the fifth annual international conference on Computational biology

Phylogenetic footprinting is a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous non-coding DNA sequences from multiple species. In an earlier paper, we presented an exact algorithm that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation
June 2005
2272 pages
ISBN:1595930108
DOI:10.1145/1068009
Editor:
Hans-Georg Beyer
Vorarlberg University of Applied Sciences, Austria
,
General Chair:
Una-May O'Reilly
CSAIL, MIT
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
genetic algorithms
motif finding
transcription factor
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 60
  Total Citations
  View Citations
- 632
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MDGA: motif discovery using a genetic algorithm

GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Finding motifs for insufficient number of sequences with strong binding to transcription facto

Assessment of clustering algorithms for unsupervised transcription factor binding site discovery

Algorithms for phylogenetic footprinting