Abstract
Algorithmic advances take advantage of the structure of massive biological data landscape.
- 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 7422 (2012), 56--65.Google ScholarCross Ref
- Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403--410.Google ScholarCross Ref
- Berger, B., Peng, J. and Singh, M. Computational solutions for omics data. Nature Reviews Genetics 14, 5 (2013), 333--346.Google ScholarCross Ref
- Bonfield, J.K. and Mahoney, M.V. Compression of FASTQ and SAM format sequencing data. PLoS ONE 8, 3 (2013), e59190.Google ScholarCross Ref
- Bredel, M. and Jacoby, E. Chemogenomics: An emerging strategy for rapid target and drug discovery. Nature Reviews Genetics 5, 4 (2004), 262--275.Google ScholarCross Ref
- Bruijn, D.N. A combinatorial problem. In Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen, Series A 49, 7 (1946), 758.Google Scholar
- Buchfink, B., Xie, C., and Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nature Methods 12, 1 (2015), 59--60.Google ScholarCross Ref
- Candes, E.J. and Tao, T. Decoding by linear programming. IEEE Transactions on Information Theory 51, 12 (2005), 4203--4215. Google ScholarDigital Library
- Cao, M., Zhang, H., Park, J., Daniels, N.M., Crovella, M.E., Cowen, L.J. and Hescott, B. Going the distance for protein function prediction: A new distance metric for protein interaction networks. PLoS ONE 8, 10 (2013).Google ScholarCross Ref
- Chindelevitch, L., Trigg, J., Regev, A. and Berger, B. An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models. Nature Communications 5, (2014).Google Scholar
- Cho, H., Berger, B., and Peng, J. Diffusion component analysis: Unraveling functional topology in biological networks. Research in Computational Molecular Biology. Springer, 2015, 62--64.Google Scholar
- Daniels, N.M., Gallant, A., Peng, J., Cowen, L.J., Baym, M. and Berger, M. Compressive genomics for protein databases. Bioinformatics 29 (2013), i283--i290.Google ScholarCross Ref
- Dobzhansky, T. Nothing in biology makes sense except in the light of evolution (1973).Google Scholar
- Forsberg, K.J., Reyes, A., Wang, B., Selleck, E.M., Sommer, M.O. and Dantas, G. The shared antibiotic resistome of soil bacteria and human pathogens. Science 337, 6098 (2012), 1107--1111.Google ScholarCross Ref
- Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E.E. and Sahinalp, S.C. mrsFAST: A cache-oblivious algorithm for short-read mapping. Nature Methods 7, 8 (2010), 576--577.Google ScholarCross Ref
- Hach, F., Sarra, I. Hormozdiari, F., Alkan, C., Eichler, E.E. and Sahinalp, S.C. mrsFAST-Ultra: a compact, SNP-aware mapper for high-performance sequencing applications. Nucleic Acids Research (2014), gku370.Google Scholar
- Hart, Y., Sheftel, H., Hausser, J., Szekely, P., Ben-Moshe, N.B., Korem, Y., Tendler, A., Mayo, A.E. and Alon, U. Inferring biological tasks using Pareto analysis of high-dimensional data. Nature Methods 12, 3 (2015), 233--235.Google ScholarCross Ref
- Indyk, P. and Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 13th Annual ACM Symposium on Theory of Computing. ACM, 1998, 604--613. Google ScholarDigital Library
- Janda, J.M. and Abbott, S.L. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clinical Microbiology 45, 9 (2007), 2761--2764.Google ScholarCross Ref
- Jardine, N. and van Rijsbergen, C.J. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971) 217--240.Google ScholarCross Ref
- Liao, C.-S., Lu, K., Baym, M., Singh, R. and Berger, B. IsoRankN: Spectral methods for global alignment of multiple protein networks. Bioinformatics 12 (2009), i253--i258. Google ScholarDigital Library
- Loh, P.-R., Baym, M., and Berger, B. Compressive genomics. Nature Biotechnology 30, 7 (2012), 627--630.Google ScholarCross Ref
- MacFabe, D.F. Short-chain fatty acid fermentation products of the gut microbiome: Implications in autism spectrum disorders. Microbial Ecology in Health and Disease 23 (2012).Google Scholar
- Marco-Sola, S., Sammeth, M., Guigó, R. and Ribeca, P. The gem mapper: Fast, accurate and versatile alignment by filtration. Nature Methods 9, 12 (2012), 1185--1188.Google ScholarCross Ref
- Marx, V. Biology: The big challenges of big data. Nature 498, 7453 (2013), 255--260.Google ScholarCross Ref
- Ochoa, I., Asnani, H., Bharadia, D., Chowdhury, M., Weissman, T. and Yona, G. QualComp: A new lossy compressor for quality scores based on rate distortion theory. BMC bioinformatics 14, 1 (2013), 187.Google Scholar
- Patro, R. and Kingsford, C. Data-dependent bucketing improves reference-free compression of sequencing reads. Bioinformatics (2015).Google Scholar
- Prat, Y., Fromer, M., Linial, N. and Linial, M. Recovering key biological constituents through sparse representation of gene expression. Bioinformatics 5 (2011), 655--661. Google ScholarDigital Library
- Rahman, S.A., Bashton, M., Holliday, G.L., Schrader, R. and Thornton, J.M. Small molecule subgraph detector (SMSD) toolkit. J. Cheminformatics 1, 1 (2009), 1--13.Google ScholarCross Ref
- Rubinfeld, R. and Shapira, A. Sublinear time algorithms. SIAM J. Discrete Mathematics 25, 4 (2011), 1562--1588. Google ScholarDigital Library
- Schatz, M.C., Langmead, B. and Salzberg, S.L. Cloud computing and the DNA data race. Nature Biotechnology 28, 7 (2010), 691--693.Google ScholarCross Ref
- Singh, R., Xu, J. and Berger, B. Global alignment of multiple protein interaction networks with application to functional orthology detection. In Proceedings of the National Academy of Sciences 105, 35 (2008), 12763--12768.Google ScholarCross Ref
- Siragusa, E., Weese, D. and Reinert, K. Fast and accurate read mapping with approximate seeds and multiple backtracking. Nucleic Acids Research 41, 7 (2013), e78.Google ScholarCross Ref
- Stephens, Z.D. et al. Big data: Astronomical or genomical? PLoS Biol. 13, 7 (2015), e1002195.Google ScholarCross Ref
- Uhlmann, J.K. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters 40, 4 (1991), 175--179.Google ScholarCross Ref
- Weinstein, J.N. et al. The cancer genome atlas pan-cancer analysis project. Nature Genetics 45, 10 (2013), 1113--1120.Google ScholarCross Ref
- Yorukoglu, D., Yu, Y.W., Peng, J. and Berger, B. Compressive mapping for next-generation sequencing Nature Biotechnology 4 (2016), 374--376.Google Scholar
- Yu, Y.W., Daniels, N., Danko, D.C. and Berger, B. Entropy-scaling search of massive biological data. Cell Systems 1, 2 (2015), 130--140.Google ScholarCross Ref
- Yu, Y.W., Yorukoglu, D., Peng, J. and Berger, B. Quality score compression improves genotyping accuracy. Nature Biotechnology 33, 3 (2015), 240--243.Google ScholarCross Ref
- Zhao, Y., Tang, H. and Ye, Y. RAPSearch2: A fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 1 (2012), 125--126. Google ScholarDigital Library
Index Terms
- Computational biology in the 21st century: scaling with compressive algorithms
Recommendations
Computational biology unplugged!
WCCCE '09: Proceedings of the 14th Western Canadian Conference on Computing EducationIn this workshop, we present an "unplugged" computational biology (or bioinformatics) activity. The activity, which may range from beginner to advanced in difficulty based on the interest of participants, will present a topic of interest in ...
Comments