|
ABSTRACT
In the last few years, the completion of the human genome sequencing showed up a wide range of new challenging issues involving raw data analysis. In particular, the discovery of information implicitly encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences, also called motifs. Because of biological observations, the class of structured motifs have received much attention. This paper gives a contribution in this setting by providing an efficient algorithm for the identification of novel classes of structured motifs, where several kinds of "exceptions" (whose biological relevance recently emerged in the literature) may be tolerated in pattern repetitions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. I. Arnone and E. H. Davidson. The hardwiring of development: organization and function of genomic regulatory systems. Development, 124:1851--1864, 1997.
|
| |
2
|
A. Bairoch. PROSITE: A dictionary of protein sites and patterns. Nucleic Acid Research, 20:2013--2018, 1992.
|
| |
3
|
A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert. Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology, 5(2):277--304, 1998.
|
| |
4
|
A. Brazma. I. Jonassen. J. Vilo. and E. Ukkonen. Predicting gene regulatory elements in silico on a genomic scale. Genome Research, 8:1202--1215. 1998.
|
| |
5
|
J. M. Chen, N. Chuzhanova, P. D. Stenson, C. Ferec, and D. N. Cooper. Meta-analysis of gross insertions causing human genetic disease: novel mutational mechanisms and the role of replication slippage. Hum. Mutat., 25(2):207--221, 2005.
|
| |
6
|
I. Erill. M. Escribano. S. Campoy. and J. Barb. In silico analysis reveals substantial variability in the gene contents of the gamma proteobacteria lexa-regulon. Bioinformatics, 19(17):2225--2236, 2003.
|
| |
7
|
E. Eskin and P. A. Pevzner. Finding composite regulatory patterns in DNA sequences. In Proceedings of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB-2002), pages 354--363, 2002.
|
| |
8
|
C. A. Gross. M. Lonetto, and R. Losick. Bacterial sigma factors. Transcriptional Regulation, 1:129--176, 1992.
|
| |
9
|
|
| |
10
|
J. van Helden. A. F. Rios. and J. Collado-Vides. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Research, 28(8): 1808--1818, 2000.
|
| |
11
|
I. Jonassen, J. F. Collins, and D. G. Higgins. Finding flexible patterns in unaligned protein sequences. Protein Science, 4:1587--1595, 1995.
|
| |
12
|
L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7:345--360, 2000.
|
| |
13
|
A. F. Neuwald and P. Green. Detecting patterns in protein sequences. Journal of Molecular Biology, 239:698--712, 1994.
|
| |
14
|
M. Osanai, H. Takahashi, K. K. Kojima, M. Hamada, and H. Fujiwara. Essential motifs in the 3' untranslated region required for retrotransposition and the precise start of reverse transcription in non-long-terminal-repeat retrotransposon SART1. Mol. Cell. Biol, 24(19):7902--7913, 2004.
|
| |
15
|
S. Robin, J.-J. Daudin, H. Richard, M.-F. Sagot, and S. Schbath. Occurrence probability of structured motifs in random sequences. Journal of Computational Biology, 9:761--773, 2003.
|
| |
16
|
H. O. Smith, T. M. Annau, and S. Chandrasegaran. Finding sequence motifs in groups of functionally related proteins. In Proc. of National Academy of Science, pages 826--830, U.S.A., 1990.
|
| |
17
|
Z. Tu, S. Li, and C. Mao. The changing tails of a novel short interspersed element in aedes aegypti: genomic evidence for slippage retrotransposition and the relationship between 3' tandem repeats and the poly(da) tail. Genetics, 168(4):2037--2047, 2004.
|
|