skip to main content
10.1145/974614.974630acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article

From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

Published:27 March 2004Publication History

ABSTRACT

An important part of deciphering gene regulatory mechanisms is discovering transcription factor binding sites. In many cases, these sites can be detected because they are often overrepresented in genomic sequences. The detection of the overrepresented signals in sequences, or motif-finding has become a central problem in computational biology. There are two major computational frameworks for attacking the motif finding problem which differ in their representation of the signals. The most popular is the profile or PSSM (Position Specific Scoring Matrix) representation. The goal of these algorithms is to obtain probabilistic representations of the overrepresented signals. Another is the consensus pattern or pattern with mismatches representation which represents a signal as discrete consensus pattern and allows some mismatches to occur in each instance of the pattern. The advantage of profiles is the expressiveness of their representation while the advantage of the consensus pattern approach is the existence of efficient algorithms that guarantee discovery of the best patterns. In this paper we present a unified framework for motif finding which encompasses both the profile representation and the consensus pattern representation. We prove that the problem of discovering the best profiles can be solved by considering a degenerate version of the problem of finding the best consensus patterns. The main advantage of our framework is that it motivates a novel algorithm, MITRA-PSSM, which discovers profiles, yet provides some of the guarantees of discovering the best signals. The algorithm searches for best profiles with respect to information content which is the same criterion of popular algorithms such as MEME and CONSENSUS. MITRA-PSSM is specifically designed for searching for profiles in this framework and introduces a novel notion of scoring consensus patterns, discrete information content. MITRA-PSSM is available for public use via webserver at http://www.calit2.net/compbio/mitra/.

References

  1. T. L. Bailey and C. Elkan. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning, 21:51, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yoseph Barash, Gill Bejerano, and Nir Friedman. A simple hyper-geometric approach for discovering putative transcription factor binding sites. In Proceedings of WABI, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O.G. Berg and P. H. von Hippel. Selection of DNA binding sites by regulatory proteins. Trends Biochem Sci., 13:207--211, Jun 1998.]]Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Buhler and M. Tompa. Finding motifs using random projections. In Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB01), pages 69--76, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. P. Dempster, N. Laird, and D. Rubin. Maximum likelihood for incomplete data via the EM algorithm. Journal of the Royal Statistical Society, ser. B, 39:1--38, 1977.]]Google ScholarGoogle Scholar
  6. Alain Denise, Mireille Régnier, and Mathias Vandenbogaert. Assessing the statistical significance of overrepresented oligonucleotides. In Proceedings of WABI, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Eskin, U. Keich, M.S. Gelfand, and P.A. Pevzner. Genome wide analysis of bacterial promoter regions. In Proceedings of the 2003 Pacific Symposium on Biocomputing, 2003.]]Google ScholarGoogle Scholar
  8. E. Eskin and P. A. Pevzner. Finding composite regulatory patterns in dna sequences. In Special Issue Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB-2002) Bioinformatics., pages 1:S354--63, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern--recognition methods for DNA sequences. Journal of Molecular Biology, 186:117--128, 1985.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. G. Hertz and G. Stormo. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15:563--577, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  11. J.D. Hughes, P.W. Estep, S. Tavazoie, and G.M. Church. Computational identification of cis-regulatory elements associated with groups of functionally related genes in saccharomyces cerevisiae. Journal of Molecular Biology, 296(5):1205--14, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. U. Keich and P. A. Pevzner. Finding motifs in the twilight zone. In Proceedings of the 6th International Conference on Computational Molecular Biology (RECOMB 2002), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208--214, 1993.]]Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Neuwald, J. Liu, and C. Lawrence. Gibbs motif sampling: Detection of bacterial outer membrane repeats. Protein Science, 4:1618--1632, 1995.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. P. A. Pevzner and S. Sze. Combinatorial approaches to finding subtle signals in DNA sequences. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 269--278, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Sagot. Spelling approximate or repeated motifs using a suffix tree. Lecture Notes in Computer Science, 1380:111--127, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Segal, Y. Barash, I. Simon, N. Friedman, and D. Koller. From promoter sequence to expression: A probabilistic framework. In Proceedings of the 6th International Conference on Research in Computational Molecular Biology (RECOMB), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Sinha and M. Tompa. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 30:5549--5560, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  19. S. Sze, M. S. Gelfand, and P. A. Pevzner. Finding subtle motifs in DNA sequences. In Proceedings of Pacific Symposium on Biocomputing, pages 235--246, 2002.]]Google ScholarGoogle Scholar
  20. M.S. Waterman, R. Arratia, and D.J. Galas. Pattern recognition in several sequences: consensus and alignm ent. Bulletin of Mathematical Biology, 46:515--527, 1984.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology
          March 2004
          370 pages
          ISBN:1581137559
          DOI:10.1145/974614

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 March 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate148of538submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader