ACM Home Page
Please provide us with feedback. Feedback
Bioinformatics—an introduction for computer scientists
Full text PdfPdf (262 KB)
Source ACM Computing Surveys (CSUR) archive
Volume 36 ,  Issue 2  (June 2004) table of contents
Pages: 122 - 158  
Year of Publication: 2004
ISSN:0360-0300
Author
Jacques Cohen  Brandeis University, Waltham, MA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 110,   Downloads (12 Months): 1205,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031120.1031122
What is a DOI?

ABSTRACT

The article aims to introduce computer scientists to the new field of bioinformatics. This area has arisen from the needs of biologists to utilize and help interpret the vast amounts of data that are constantly being gathered in genomic research---and its more recent counterparts, proteomics and functional genomics. The ultimate goal of bioinformatics is to develop in silico models that will complement in vitro and in vivo biological experiments. The article provides a bird's eye view of the basic concepts in molecular cell biology, outlines the nature of the existing data, and describes the kind of computer algorithms and techniques that are necessary to understand cell behavior. The underlying motivation for many of the bioinformatics approaches is the evolution of organisms and the complexity of working with incomplete and noisy data. The topics covered include: descriptions of the current software especially developed for biologists, computer and mathematical cell models, and areas of computer science that play an important role in bioinformatics.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
For interesting graphical gallery of biology consult (downloadable drawings) sponsored by the National Health Museum http://www.accessexcellence.org/AB/GG/.
 
2
A recommended glossary of genetic terms http://www.ornl.gov/TechResources/Human_Genome/publicat/primer2001/glossary.html.
 
3
NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov.
 
4
A summary of interesting sites in bioinformatics is given by the URLs.
 
5
On line lectures in bioinformatics---Heidelberg http://www.dkfz-heidelberg.de/tbi/bioinfo/Biol/Intro/.
 
6
A special interest group with news and pointers http://www.bioinformatrix.com.
 
7
Bioinformatics Bulletin Board http://bioinformatics.org/faq/#education.
 
8
Bioinformatics resources http://www.brc.dcs.gla.ac.uk/∼actan/resources.html.
 
9
Interesting and useful URL's on existing courses.
 
10
Jackson's Laboratory Web Page with educational links http://www.jax.org/courses.
 
11
Course in bioinformatics (recommended set of slides by R. L. Bernstein) http://www.swbic.org/education/bioinfo/.
 
12
Highly recommended texts in molecular cell biology {Alberts et al. 2004; Lodish et al. 2003}.
 
13
Some texts in computational biology or bio-informatics {Baldi and Brunak 2002; Baxevanis and Ouellette 1998; Campbell and Heyer 2002; Claverie and Notredame 2003; Durbin et al. 1998; Dwyer 2002; Felsenstein 2003; Gonick and Wheelis 1991; Gusfield 1997; Krane and Raymer 2003; Jones and Pevzner 2004; Mount 2001; Orengo et al. 2003; Pevsner 2003, Pevzner 2000; Setubal and Meidanis 1997; Salzberg et al. 1998; Waterman 1995}.
 
14
Main Journals in BioInformatics Bioinformatics, Oxford University Press IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). Journal of Computational Biology, Mary Ann Liebert, Inc, Publishers
 
15
Note: Many biology journals publish articles related to bioinformatics, e.g., Science, Nature, Nucleic Acids Research, Journal of Molecular Biology, Proceedings of the National Academy of Sciences (PNAS), etc. In particular Nucleic Acid Research publishes a compendium of URL's in its yearly January issue.
 
16
Yearly Conferences RECOMB, Research in Computational Molecular IEE Biology Computer Society Bioinformatics Conference PSB Pacific Symposium on Biocomputing ISMB Intelligent Systems for Molecular Biology
 
17
Articles and Books
18
 
19
Alberts, B., Bray, D., Johnson, A., Lewis, J., Raff, M., Roberts, K., and Walter, F. 2004. Essential Cell Biology, 2nd ed. Garland Publishing.
 
20
Ashburner, M. and Goodman, N. 1997. Informatics: Genome and genetic databases. Curr. Op. Gen. Develop. 7, 750--756.
 
21
22
 
23
Baxevanis, A., and Ouellette, B. F. F. (Eds.). 1998. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley, New York.
 
24
Bennett, C., Li, M., and Ma, B. 2003. Linking chain letters. Sci. Amer. (June) 77--81.
 
25
Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, Jr., M. and Haussler, D. 2000. Knowledge-based analysis of microarray gene expression data using support vector machines. Proc. Nat. Acad. Sci. 97, 1, 262--267.
 
26
Campbell, A. M. and Heyer, L. 2002. Discovering Genomics, Proteomics and BioInformatics. Benjamin Cummings.
 
27
Claverie, J. M. and Notredame, C. 2003. Bioinformatics for Dummies. Wiley, New York.
 
28
Cohen, J. 2001. Classification of approaches used to study cell regulation: Search for a unified view using constraints and machine learning. Electronic Transactions in Artificial Intelligence, Machine Intelligence 18. Linköping Electronic Articles in Computer and Information Science ISSN 1401-9841, 6(025).
 
29
Cohen, J. 2003. Guidelines for establishing undergraduate bioinformatics courses. J. Sci. Educat. Tech. 12, 4 (Dec.) 449--456.
 
30
DeJong, H. 2002. Modeling and simulation of genetic regulatory systems: A literature review. J. Comput. Biol. 9, 1, 67--103.
 
31
Delcher, A., Kasif, S., Fleischmann, R. D., Peterson, J., White, O., and Salzberg, S. L. 1999. Alignment of whole genomes. Nucl. Acid Res. 27, 11, 2369--2376.
 
32
Duenwald, M. 2003. Gene is linked to susceptibility to depression. The New York Times, July 18, Sect. A, Page 14, Col. 1.
 
33
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 1998. Biological Sequence Analysis. Cambridge University Press, Cambridge, Mass.
 
34
 
35
Felsenstein, J. 2003. Inferring Phylogenies, Sinauer Associates.
36
 
37
Gilbert, D. R., Westhead, D. R., Nagano, N., and Thornton, J. M. 1999. Motif-based searching in TOPS protein topology databases. Bioinformatics 5, 4, 317--326. Also see http://www.sander. embl-ebi.ac.uk/tops/.
 
38
Gonick, L. and Wheelis, M. 1991. A Cartoon Guide to Genetics. Harper Perennial.
 
39
Goodman, N. 2002. Biological data becomes computer literate: new advances in bioinformatics. Curr. Op. Biotech. 13, 66--71.
 
40
 
41
42
43
 
44
Jones, N. C. and Pevzner, P. A. 2004. An Introduction to Bioinformatics Algorithms, MIT Press, Cambridge, Mass.
 
45
Karp, P. 2001. Pathway databases: A case study in computational symbolic theories. Science 293, 2040--2044.
 
46
Kelly, H. C. 2003. Terrorism and the biology lab. New York Times Op-Ed Page, July 2.
 
47
Knuth, D. E. 1993. Computer Literacy Bookshops Interview (Dec.) (Available at http://dmoz.org/Computers/History/Pioneers/Knuth,_Donald/).
 
48
Krane, D. and Raymer, M. 2003. Fundamental Concepts of BioInformatics. Benjamin Cummings.
 
49
Krogh, A. 1998. An introduction to hidden Markov models for biological sequences. In S. L. Salzberg, D. B. Searls, and S. Kasif (eds.), Computational Methods in Molecular Biology. Elsevier, Amsterdam, The Netherlands, pp. 45--63.
 
50
 
51
Lathrop, R. H. and Smith, T. F. 1996. Global optimum protein threading with gapped alignment and empirical pair potentials. J. Molec. Biol. 255, 641--665.
 
52
Li, H., Helling, R., Tang, C., and Wingreen, N. 1996. Emergence of preferred structures in a simple model of protein folding. Science 273, 666--669.
 
53
Liang, S., Fuhrman, S., and Somogyi, R. 1998. REVEAL, A general reverse engineering algorithm for inference of genetic network architectures. In Pacific Symposium on Biocomputing 3, pp. 18--29.
 
54
Lodish, H., Berk, A., Matsudaira, P., Kaiser, C. A., Krieger, M., Scott, M. P., Zipursky, L., and Darnell, J. 2003. Molecular Cell Biology. W.H. Freeman.
 
55
Luscombe, N. M., Greenbaum, D., and Gerstein, M. 2001. What is bioinformatics? A proposed definition and overview of the field. Methods Inf. Med. 40, 346--358 (Also available at http:// bioinfo.mbb.yale.edu/papers/).
 
56
Miller, W. 2001. Comparison of genomic DNA sequences: Solved and unsolved problems. Bioinformatics 17, 5, 391--397.
 
57
 
58
Mount, D. W. 2001. Bioinformatics: Sequence and Genome Analysis, Cold Spring Harbor Press, Cold Spring Harbor, N.Y.
 
59
Gene Myers, Whole-genome DNA sequencing, Computing in Science and Engineering, v.1 n.3, p.33-43, May 1999
 
60
Orengo, C. A., Jones, D. T., and Thornton, J. M. 2003. Bioinformatics: Genes, Proteins and Computers. BIOS Scientific Publishers, Oxford, England.
 
61
 
62
Pevsner, J. 2003. Bioinformatics and Functional Genomics. Wiley-Liss.
 
63
Pevzner, P. A. 2000. Computational Molecular Biology: An Algorithmic Approach. MIT Press, Cambridge, Mass.
 
64
Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.
 
65
Regev, E. and Shapiro, E. 2002. Cellular abstractions: Cells as computation. Nature 419 (Sept.), 419--443.
 
66
Rivas, E. and Eddy, S. R. 2000. The language of RNA: A formal grammar that includes pseudo knots. Bioinformatics 18, 4, 334--340.
 
67
Salzberg, S. L., Searls, D. B., and Kasif, S., Eds. 1998. Computational Methods in Molecular Biology. Elsevier, Amsterdam, The Netherlands.
 
68
Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. 2000. PipMaker---A web server for aligning two genomic DNA sequence. Genome Res. 10, 4 (Apr.), 577--586.
 
69
Searls, D. B. 1992. The linguistics of DNA. Amer. Sci. 80, 579--591.
 
70
Searls, D. B. 1998. Grand challenges in computational Biology. In Computational Methods in Molecular Biology, S. L. Salzberg, D. B. Searls, and S. Kasif, Eds. Elsevier Amsterdam, The Netherlands.
 
71
Searls, D. B. 2002. The language of genes. Nature 420 (November), 211--217.
 
72
Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology, PWS Publishing.
 
73
Thierry-Mieg, N. 2000. Protein-protein interaction prediction for C. elegans: In Knowledge Discovery in Biology, Workshop at the PKDD2000 (Conference on Principles and Practice of Knowledge Discovery in Databases) (Lyon, France, Sept.).
 
74
Thompson, J. D., Higgins, D. G., and Gibson, T. J. 1994. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nuc. Acid Res. 22, 4673--4680.
 
75
Tomita, M., Hashimoto, K., Takahashi, K., Shimizu, T. S., Matsuzaki, Y., Miyoshi, F., Saito, K., Tanida, S., Yugi, K., Venter, J. C., and Hutchinson III, C. A. 1999. E-CELL: Software environment for whole cell simulation. Bioinformatics 15, 1, 72--84.
 
76
Waterman, M. S. 1995. Introduction to Computational Biology: Maps, Sequences and Genomes. CRC Press.
 
77
Watson, J. D. and Berry, A. 2003. DNA: The Secret of Life. Knopf.
78
 
79
Zuker, M. and Stiegler, P. 1981. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nuc. Acids Res. 9, 133--148. (Also see http://www.bioinfo.rpi.edu/∼zukerm/).