|
ABSTRACT
The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with different evolutionary distances and the conservation of the non-coding regions between closely related organisms is found. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. This study sought to illuminate the impact of evolutionary distances on the performance of the proposed gene-finding program based on the cross-species sequence comparison. Base on the finding from comparative study and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Batzoglou, S., Pachter, L., Mesirovi, J. P., Berger, B. and Lander, E. S. (2000). Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 7, 950--958.
|
| |
3
|
Burge, C. and Karlin, S. (1997). Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78--94.
|
| |
4
|
Burset, M. and Guigó, R. (1996). Evaluation of gene structure prediction programs. Genomics 34, 353--367
|
| |
5
|
Claverie, J.-M. (1997). Computational methods for the identification of genes in vertebrate genomic sequences. Hum. Mol. Genet. 6, 1735--1744.
|
| |
6
|
Guigó, R., Agarwal, P., Abril, J. F., Burset, M. and Fickett, J. W. (2000). An Assessment of Gene Prediction Accuracy in Large DNA Sequences. Genome Res. 10, 1631--1642.
|
| |
7
|
Miller, W. (2001). Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391--397.
|
| |
8
|
Morgenstern, B., Rinner, O., Abdeddaïm, S., Haase, D., Mayer, K., Dress, A. and Mewes, H.-W. (2001). Exon prediction by comparative sequence analysis. In: The Human Genome Meeting 2001, Edinburgh, Programme and Abstract Book pp. 146--147.
|
| |
9
|
Novichkov, P. S., Gelfand, M. S. and Mironov, A. A. (2001). Gene recognition in eukaryotic DNA by comparison of genomic sequences. Bioinformatics 17, 1011--1018.
|
| |
10
|
Otu, H. and Sayood, K. (2002). A New Sequence Distance Measure for Phylogenetic Tree Construction.
|
| |
11
|
Mathé, C et. al. Current Methods of Gene Prediction, Their Strengths and Weakness. Nucleic Acids Research, 2002, Vol. 30 No. 19, 44103--4117.
|
| |
12
|
Mayor C. et. al. VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics, Vol. 16 no. 11 2000, pages 1046--1047.
|
| |
13
|
Salzberg, S. L., A method for identifying splice sites and translational start sites in eukaryotic mRNA. Comput. Appl. Biosci. 13, 365--376, 1997.
|
| |
14
|
|
| |
15
|
Functional and Comparative Genomics Fact Sheet, Human Genome Project Information
|
| |
16
|
Stormo G. Gene-Finding approaches for Eukaryotes. Genome Research Vol. 10, Issue 4, 394--397, April 2000.
|
|