skip to main content
10.1145/1068009.1068080acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
Article

MDGA: motif discovery using a genetic algorithm

Published:25 June 2005Publication History

ABSTRACT

Computationally identifying transcription factor binding sites in the promoter regions of genes is an important problem in computational biology and has been under intensive research for a decade. To predict the binding site locations efficiently, many algorithms that incorporate either approximate or heuristic techniques have been developed. However, the prediction accuracy is not satisfactory and binding site prediction thus remains a challenging problem. In this paper, we develop an approach that can be used to predict binding site motifs using a genetic algorithm. Based on the generic framework of a genetic algorithm, the approach explores the search space of all possible starting locations of the binding site motifs in different target sequences with a population that undergoes evolution. Individuals in the population compete to participate in the crossovers and mutations occur with a certain probability. Initial experiments demonstrated that our approach could achieve high prediction accuracy in a small amount of computation time. A promising advantage of our approach is the fact that the computation time does not explicitly depend on the length of target sequences and hence may not increase significantly when the target sequences become very long.

References

  1. Bailey, T.L. and Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, Stanford, CA, AAAI Press, Bethesda, MD, 1994, 28--36.Google ScholarGoogle Scholar
  2. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: A sequence logo generator, Genome Research, 14, 6 (June 2004), 1188--1190.Google ScholarGoogle ScholarCross RefCross Ref
  3. Eiben, A.E. and Smith, J.E. Introduction to Evolutionary Computing. Springer-Verlag, New York. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Galas, D.J. and Schmitz, A. A DNA footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res., 5, 9 (Sep. 1978), 3157--3170.Google ScholarGoogle ScholarCross RefCross Ref
  5. Garner, M. M., and A. Revzin. A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res. 9, 13 (July, 1981), 3047--3060.Google ScholarGoogle ScholarCross RefCross Ref
  6. Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N., Macisaac, K.D., Danford, T.D., Hannett, N.M., Tagne, J.-B., Reynolds, D.B., Yoo, J., Jennings, E.G., Zeitlinger, J., Pokholok, D.K., Kellis, M., Rolfe, P.A., Takusagawa, K.T., Lander, E.S., Gifford, D.K., Fraenkel, E. and Young, R.A. Transcriptional Regulatory Code of a Eukaryotic Genome. Nature, 431, 7004 (Sep. 2004), 99--104.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hertz, G.Z. and Stormo, G.D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15, 7 (July, 1999), 563--577.Google ScholarGoogle ScholarCross RefCross Ref
  8. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. and Wooton, J.C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262, 8 (Oct. 1993), 208--214.Google ScholarGoogle ScholarCross RefCross Ref
  9. Liu, F.F.M., Tsai, J.J.P., Chen, R.M., Chen, S.N. and Shih, S.H. FMGA: finding motifs by genetic algorithm. IEEE Fourth Symposium on Bioinformatics and Bioengineering (BIBE 2004), May 2004, 459--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Liu, J.S., Neuwald, A.F. and Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc., 90, 432 (Nov. 1995), 1156--1170.Google ScholarGoogle ScholarCross RefCross Ref
  11. Liu, X., Brutlag, D.L. and Liu, J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput., 6, 2001, 127--138.Google ScholarGoogle Scholar
  12. Notredame, C. and Higgins, D.G. SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Res. 24, 8 (Apr. 1996), 1515--1524Google ScholarGoogle ScholarCross RefCross Ref
  13. Roth, F.R., Hughes, J. D., Estep, P. E. and Church G. M. Finding DNA regulatory motifs within unaligned non-coding sequences clustered by whole-Genome mRNA quantitation. Nature Biotechnology 16, 10 (Oct. 1998), 939--45.Google ScholarGoogle ScholarCross RefCross Ref
  14. Stormo, G.D. Computer methods for analyzing sequence recognition of nucleic acids. Annu. Rev. BioChem. 17, 1988, 241--263.Google ScholarGoogle Scholar
  15. Stormo,G.D. and Hartzell, G.W. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA, 86, 4, (Feb. 1989), 1183--1187.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. MDGA: motif discovery using a genetic algorithm

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation
          June 2005
          2272 pages
          ISBN:1595930108
          DOI:10.1145/1068009

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,669of4,410submissions,38%

          Upcoming Conference

          GECCO '24
          Genetic and Evolutionary Computation Conference
          July 14 - 18, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader