Article

From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

Author:
Eleazar Eskin

University of California, San Diego, CA

University of California, San Diego, CA
View Profile

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biologyMarch 2004Pages 115–124https://doi.org/10.1145/974614.974630

Published:27 March 2004Publication History

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

Pages 115–124

ABSTRACT

An important part of deciphering gene regulatory mechanisms is discovering transcription factor binding sites. In many cases, these sites can be detected because they are often overrepresented in genomic sequences. The detection of the overrepresented signals in sequences, or motif-finding has become a central problem in computational biology. There are two major computational frameworks for attacking the motif finding problem which differ in their representation of the signals. The most popular is the profile or PSSM (Position Specific Scoring Matrix) representation. The goal of these algorithms is to obtain probabilistic representations of the overrepresented signals. Another is the consensus pattern or pattern with mismatches representation which represents a signal as discrete consensus pattern and allows some mismatches to occur in each instance of the pattern. The advantage of profiles is the expressiveness of their representation while the advantage of the consensus pattern approach is the existence of efficient algorithms that guarantee discovery of the best patterns. In this paper we present a unified framework for motif finding which encompasses both the profile representation and the consensus pattern representation. We prove that the problem of discovering the best profiles can be solved by considering a degenerate version of the problem of finding the best consensus patterns. The main advantage of our framework is that it motivates a novel algorithm, MITRA-PSSM, which discovers profiles, yet provides some of the guarantees of discovering the best signals. The algorithm searches for best profiles with respect to information content which is the same criterion of popular algorithms such as MEME and CONSENSUS. MITRA-PSSM is specifically designed for searching for profiles in this framework and introduces a novel notion of scoring consensus patterns, discrete information content. MITRA-PSSM is available for public use via webserver at http://www.calit2.net/compbio/mitra/.

References

T. L. Bailey and C. Elkan. Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning, 21:51, 1995.]] Google ScholarDigital Library
Yoseph Barash, Gill Bejerano, and Nir Friedman. A simple hyper-geometric approach for discovering putative transcription factor binding sites. In Proceedings of WABI, 2001.]] Google ScholarDigital Library
O.G. Berg and P. H. von Hippel. Selection of DNA binding sites by regulatory proteins. Trends Biochem Sci., 13:207--211, Jun 1998.]]Google ScholarCross Ref
J. Buhler and M. Tompa. Finding motifs using random projections. In Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB01), pages 69--76, 2001.]] Google ScholarDigital Library
A. P. Dempster, N. Laird, and D. Rubin. Maximum likelihood for incomplete data via the EM algorithm. Journal of the Royal Statistical Society, ser. B, 39:1--38, 1977.]]Google Scholar
Alain Denise, Mireille Régnier, and Mathias Vandenbogaert. Assessing the statistical significance of overrepresented oligonucleotides. In Proceedings of WABI, 2001.]] Google ScholarDigital Library
E. Eskin, U. Keich, M.S. Gelfand, and P.A. Pevzner. Genome wide analysis of bacterial promoter regions. In Proceedings of the 2003 Pacific Symposium on Biocomputing, 2003.]]Google Scholar
E. Eskin and P. A. Pevzner. Finding composite regulatory patterns in dna sequences. In Special Issue Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB-2002) Bioinformatics., pages 1:S354--63, 2002.]]Google ScholarCross Ref
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern--recognition methods for DNA sequences. Journal of Molecular Biology, 186:117--128, 1985.]]Google ScholarCross Ref
G. Hertz and G. Stormo. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15:563--577, 1999.]]Google ScholarCross Ref
J.D. Hughes, P.W. Estep, S. Tavazoie, and G.M. Church. Computational identification of cis-regulatory elements associated with groups of functionally related genes in saccharomyces cerevisiae. Journal of Molecular Biology, 296(5):1205--14, 2000.]]Google ScholarCross Ref
U. Keich and P. A. Pevzner. Finding motifs in the twilight zone. In Proceedings of the 6th International Conference on Computational Molecular Biology (RECOMB 2002), 2002.]] Google ScholarDigital Library
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald, and J. C. Wootton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208--214, 1993.]]Google ScholarCross Ref
A. Neuwald, J. Liu, and C. Lawrence. Gibbs motif sampling: Detection of bacterial outer membrane repeats. Protein Science, 4:1618--1632, 1995.]]Google ScholarCross Ref
P. A. Pevzner and S. Sze. Combinatorial approaches to finding subtle signals in DNA sequences. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pages 269--278, 2000.]] Google ScholarDigital Library
M. Sagot. Spelling approximate or repeated motifs using a suffix tree. Lecture Notes in Computer Science, 1380:111--127, 1998.]] Google ScholarDigital Library
E. Segal, Y. Barash, I. Simon, N. Friedman, and D. Koller. From promoter sequence to expression: A probabilistic framework. In Proceedings of the 6th International Conference on Research in Computational Molecular Biology (RECOMB), 2002.]] Google ScholarDigital Library
S. Sinha and M. Tompa. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 30:5549--5560, 2002.]]Google ScholarCross Ref
S. Sze, M. S. Gelfand, and P. A. Pevzner. Finding subtle motifs in DNA sequences. In Proceedings of Pacific Symposium on Biocomputing, pages 235--246, 2002.]]Google Scholar
M.S. Waterman, R. Arratia, and D.J. Galas. Pattern recognition in several sequences: consensus and alignm ent. Bulletin of Mathematical Biology, 46:515--527, 1984.]]Google ScholarCross Ref

Index Terms

From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles
1. Applied computing
  1. Life and medical sciences

Recommendations

Identification of specific sequence motifs in the upstream region of 242 human miRNA genes

We have identified novel over-represented and conserved motifs in the upstream regions of human and mouse miRNA stem-loop sequences by means of a new bioinformatic processing regimen. We observed sequence conservation -500bp upstream in 189 human and ...
Read More
Finding functional promoter motifs by computational methods: a word of caution

The standard practice in the analysis of promoters is to select promoter regions of convenient length. This may lead to false results when searching for Transcription Factor Binding Sites (TFBSs), since the sequences may contain coding segments. In such ...
Read More
Pathway to functional studies: pipeline linking phylogenetic footprinting and transcription-factor binding analysis
WISB '06: Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73

Identification of transcription-factor binding sites is a critical first step in studying transcriptional regulation of genes. The comparative genomics method of phylogenetic footprinting is based on identifying sequence elements that are conserved ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology
March 2004
370 pages
ISBN:1581137559
DOI:10.1145/974614
General Chair:
Philip E. Bourne
University of California, San Diego
,
Program Chair:
Dan Gusfield
University of California, Davis
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 March 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
motif-finding
patterns
profiles
transcription factor binding sites
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate148of538submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 637
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identification of specific sequence motifs in the upstream region of 242 human miRNA genes

Finding functional promoter motifs by computational methods: a word of caution

Pathway to functional studies: pipeline linking phylogenetic footprinting and transcription-factor binding analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identification of specific sequence motifs in the upstream region of 242 human miRNA genes

Finding functional promoter motifs by computational methods: a word of caution

Pathway to functional studies: pipeline linking phylogenetic footprinting and transcription-factor binding analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media