review-article

Open Access

Computational biology in the 21st century: scaling with compressive algorithms

Authors:
Bonnie Berger

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

,
Noah M. Daniels

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

,
Y. William Yu

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

Authors Info & Claims

Communications of the ACM Volume 59 Issue 8August 2016pp 72–80https://doi.org/10.1145/2957324

Published:22 July 2016Publication History

Communications of the ACM

Abstract

Algorithmic advances take advantage of the structure of massive biological data landscape.

References

1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 7422 (2012), 56--65.Google ScholarCross Ref
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403--410.Google ScholarCross Ref
Berger, B., Peng, J. and Singh, M. Computational solutions for omics data. Nature Reviews Genetics 14, 5 (2013), 333--346.Google ScholarCross Ref
Bonfield, J.K. and Mahoney, M.V. Compression of FASTQ and SAM format sequencing data. PLoS ONE 8, 3 (2013), e59190.Google ScholarCross Ref
Bredel, M. and Jacoby, E. Chemogenomics: An emerging strategy for rapid target and drug discovery. Nature Reviews Genetics 5, 4 (2004), 262--275.Google ScholarCross Ref
Bruijn, D.N. A combinatorial problem. In Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen, Series A 49, 7 (1946), 758.Google Scholar
Buchfink, B., Xie, C., and Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 1 (2015), 59--60.Google ScholarCross Ref
Candes, E.J. and Tao, T. Decoding by linear programming. IEEE Transactions on Information Theory 51, 12 (2005), 4203--4215. Google ScholarDigital Library
Cao, M., Zhang, H., Park, J., Daniels, N.M., Crovella, M.E., Cowen, L.J. and Hescott, B. Going the distance for protein function prediction: A new distance metric for protein interaction networks. PLoS ONE 8, 10 (2013).Google ScholarCross Ref
Chindelevitch, L., Trigg, J., Regev, A. and Berger, B. An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models. Nature Communications 5, (2014).Google Scholar
Cho, H., Berger, B., and Peng, J. Diffusion component analysis: Unraveling functional topology in biological networks. Research in Computational Molecular Biology. Springer, 2015, 62--64.Google Scholar
Daniels, N.M., Gallant, A., Peng, J., Cowen, L.J., Baym, M. and Berger, M. Compressive genomics for protein databases. Bioinformatics 29 (2013), i283--i290.Google ScholarCross Ref
Dobzhansky, T. Nothing in biology makes sense except in the light of evolution (1973).Google Scholar
Forsberg, K.J., Reyes, A., Wang, B., Selleck, E.M., Sommer, M.O. and Dantas, G. The shared antibiotic resistome of soil bacteria and human pathogens. Science 337, 6098 (2012), 1107--1111.Google ScholarCross Ref
Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E.E. and Sahinalp, S.C. mrsFAST: A cache-oblivious algorithm for short-read mapping. Nature Methods 7, 8 (2010), 576--577.Google ScholarCross Ref
Hach, F., Sarra, I. Hormozdiari, F., Alkan, C., Eichler, E.E. and Sahinalp, S.C. mrsFAST-Ultra: a compact, SNP-aware mapper for high-performance sequencing applications. Nucleic Acids Research (2014), gku370.Google Scholar
Hart, Y., Sheftel, H., Hausser, J., Szekely, P., Ben-Moshe, N.B., Korem, Y., Tendler, A., Mayo, A.E. and Alon, U. Inferring biological tasks using Pareto analysis of high-dimensional data. Nature Methods 12, 3 (2015), 233--235.Google ScholarCross Ref
Indyk, P. and Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 13^th Annual ACM Symposium on Theory of Computing. ACM, 1998, 604--613. Google ScholarDigital Library
Janda, J.M. and Abbott, S.L. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clinical Microbiology 45, 9 (2007), 2761--2764.Google ScholarCross Ref
Jardine, N. and van Rijsbergen, C.J. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971) 217--240.Google ScholarCross Ref
Liao, C.-S., Lu, K., Baym, M., Singh, R. and Berger, B. IsoRankN: Spectral methods for global alignment of multiple protein networks. Bioinformatics 12 (2009), i253--i258. Google ScholarDigital Library
Loh, P.-R., Baym, M., and Berger, B. Compressive genomics. Nature Biotechnology 30, 7 (2012), 627--630.Google ScholarCross Ref
MacFabe, D.F. Short-chain fatty acid fermentation products of the gut microbiome: Implications in autism spectrum disorders. Microbial Ecology in Health and Disease 23 (2012).Google Scholar
Marco-Sola, S., Sammeth, M., Guigó, R. and Ribeca, P. The gem mapper: Fast, accurate and versatile alignment by filtration. Nature Methods 9, 12 (2012), 1185--1188.Google ScholarCross Ref
Marx, V. Biology: The big challenges of big data. Nature 498, 7453 (2013), 255--260.Google ScholarCross Ref
Ochoa, I., Asnani, H., Bharadia, D., Chowdhury, M., Weissman, T. and Yona, G. QualComp: A new lossy compressor for quality scores based on rate distortion theory. BMC bioinformatics 14, 1 (2013), 187.Google Scholar
Patro, R. and Kingsford, C. Data-dependent bucketing improves reference-free compression of sequencing reads. Bioinformatics (2015).Google Scholar
Prat, Y., Fromer, M., Linial, N. and Linial, M. Recovering key biological constituents through sparse representation of gene expression. Bioinformatics 5 (2011), 655--661. Google ScholarDigital Library
Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R. and Thornton, J.M. Small molecule subgraph detector (SMSD) toolkit. J. Cheminformatics 1, 1 (2009), 1--13.Google ScholarCross Ref
Rubinfeld, R. and Shapira, A. Sublinear time algorithms. SIAM J. Discrete Mathematics 25, 4 (2011), 1562--1588. Google ScholarDigital Library
Schatz, M.C., Langmead, B. and Salzberg, S.L. Cloud computing and the DNA data race. Nature Biotechnology 28, 7 (2010), 691--693.Google ScholarCross Ref
Singh, R., Xu, J. and Berger, B. Global alignment of multiple protein interaction networks with application to functional orthology detection. In Proceedings of the National Academy of Sciences 105, 35 (2008), 12763--12768.Google ScholarCross Ref
Siragusa, E., Weese, D. and Reinert, K. Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Research 41, 7 (2013), e78.Google ScholarCross Ref
Stephens, Z.D. et al. Big data: Astronomical or genomical? PLoS Biol. 13, 7 (2015), e1002195.Google ScholarCross Ref
Uhlmann, J.K. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40, 4 (1991), 175--179.Google ScholarCross Ref
Weinstein, J.N. et al. The cancer genome atlas pan-cancer analysis project. Nature Genetics 45, 10 (2013), 1113--1120.Google ScholarCross Ref
Yorukoglu, D., Yu, Y.W., Peng, J. and Berger, B. Compressive mapping for next-generation sequencing Nature Biotechnology 4 (2016), 374--376.Google Scholar
Yu, Y.W., Daniels, N., Danko, D.C. and Berger, B. Entropy-scaling search of massive biological data. Cell Systems 1, 2 (2015), 130--140.Google ScholarCross Ref
Yu, Y.W., Yorukoglu, D., Peng, J. and Berger, B. Quality score compression improves genotyping accuracy. Nature Biotechnology 33, 3 (2015), 240--243.Google ScholarCross Ref
Zhao, Y., Tang, H. and Ye, Y. RAPSearch2: A fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 1 (2012), 125--126. Google ScholarDigital Library

Index Terms

Computational biology in the 21st century: scaling with compressive algorithms

Recommendations

Computational Non-coding RNA Biology
Read More
Algorithms for Computational Biology: First International Conference, AlCoB 2014, Tarragona, Spain, July 1-3, 2014, Proceedings
Read More
Computational biology unplugged!
WCCCE '09: Proceedings of the 14th Western Canadian Conference on Computing Education

In this workshop, we present an "unplugged" computational biology (or bioinformatics) activity. The activity, which may range from beginner to advanced in difficulty based on the interest of participants, will present a topic of interest in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 59, Issue 8
August 2016
94 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2975594
Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2016 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 July 2016
Check for updates
Qualifiers
- review-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 19,717
  Total Downloads
- Downloads (Last 12 months)859
- Downloads (Last 6 weeks)149
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Computational biology in the 21st century: scaling with compressive algorithms

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Computational Non-coding RNA Biology

Algorithms for Computational Biology: First International Conference, AlCoB 2014, Tarragona, Spain, July 1-3, 2014, Proceedings

Computational biology unplugged!

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Computational biology in the 21st century: scaling with compressive algorithms

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Computational Non-coding RNA Biology

Algorithms for Computational Biology: First International Conference, AlCoB 2014, Tarragona, Spain, July 1-3, 2014, Proceedings

Computational biology unplugged!

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media