ABSTRACT
The Bio-Grid REU (Research Experience for Undergraduates) Site offers undergraduate students to participate in the research activities associated with the Bio-Grid Initiatives conducted at UConn. The initiatives aim at advancing the application of modern computing infrastructures and information technology to research and practice in various life-science disciplines. Training seminars are designed to equip students with preliminary background knowledge such as basic parallel programming skills, large-scale data analytics, and middleware support, etc., as well as some ongoing life-science research projects using these computing methods. Students participate in research activities associated with several collaborative projects supported by a campus-wide computational and data grid. The Site was supported by the national Science Foundation from 08-10 and 12-14.
The REU project introduces such interdisciplinary research work to students in the early stage of their academic career to spark their interest. The project aims at preparing future software engineers to formalize and solve emerging life-science problems, as well as life-science researchers with a strong background in high-performance computing.
- A. Apostolico and G. Bejerano. Optimal Amnesic Probabilistic Automata or How to Learn and Classify Proteins in Linear Time and Space. In Proceedings of Fourth International Conference on Computational Molecular Biology (RECOMB), pages 25--32, 2000. Google ScholarDigital Library
- S. Balla, V. Thapar, S. Verma, T. Luong, T. Faghri, C.-H. Huang, S. Rajasekaran, J. del Campo, J. Shinn, W. Mohler, M. Maciejewski, M. Gryk, B. Piccirillo, S. Schiller, and M. Schiller. Minimotif Miner: A New Tool for Investigating Protein Function. Nature Methods, 3(3):1--3, 2005.Google Scholar
- A. Bateman et al. The Pfam protein families database. Nucleic Acids Res., 30:276--280, 2002.Google ScholarCross Ref
- G. Bejerano and G. Yona. Modeling Protein Families Using Probabilistic Suffix Trees. In Proceedings of Third International Conference on Computational Molecular Biology (RECOMB), pages 15--24, 1999. Google ScholarDigital Library
- F. Berman, G. Fox, and T. Hey. Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons, 2003. Google ScholarDigital Library
- E. Birney. Hidden Markov Models in Biological Sequence Analysis. In IBM J. RES. & DEV 45(3/4), pages 449--454, 2001. Google ScholarDigital Library
- R. Butler, D. Engert, I. Foster, C. Kesselman, S. Tuecke, J. Volmer, and V. Welch. A National-Scale Authentication Infrastructure. IEEE Transactions on Computer, 33(12):60--66, 2000. Google ScholarDigital Library
- D. Collins, J. Montagnat, A. Zijdenbos, and A. Evans. Automated Estimation of Brain Volume in Multiple Sclerosis with BICCR. Information Processing in Medical Imaging, 2001. Google ScholarDigital Library
- G. Comi, M. Philippi, V. Martinelli, G. Sirabian, A. Visciani, A. Cambi, S. Mammi, M. Rovaris, and M. Canal. Brain Magnetic Resonance Imaging Correlates of Cognitive Impairment in Multiple Sclerosis. Journal of Neurological Science, 115:66--73, 1993.Google ScholarCross Ref
- M. P. Evett, J. A. Hendler, and L. Spector. Parallel Knowledge Representation on the Connection Machine. Journal of Parallel and Distributed Computing, 22:168--184, 1991. Google ScholarDigital Library
- I. Foster. The Grid: A New Infrastructure for 21st Century. Physics Today, 55(2):42--47, 2002.Google ScholarCross Ref
- I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco, 1999. Google ScholarDigital Library
- M. L. Green and R. Miller. Molecular Structure Determination on a Computational and Data Grid. In Proceedings 4-th IEEE/ACM Symposium on Cluter Computing and the Grid - BioGrid Workshop, CD-ROM, 2004. Google ScholarDigital Library
- X. He and C.-H. Huang. Communication Efficient BSP Algorithm for All Nearest Smaller Values Problem. Journal of Parallel and Distributed Computing, 61:1425--1438, 2001. Google ScholarDigital Library
- S. Henikoff and J. G. Henikoff. Amino acid Substitution Matrices From Protein Blocks. In Proceedings of Natl. Acad. Sci., 89, pages 10915--10919, 1992.Google ScholarCross Ref
- C.-H. Huang. Grid-Enabled Parallel Divide-and-Conquer -- Theory and Practice. In Proceedings of the 17th ACM Symposium on Applied Computing, Madrid, Spain, pages 865--869, 2002. Google ScholarDigital Library
- C.-H. Huang. Parallel Pattern Identification in Biological Sequences on Clusters. In Proceedings of the 4th IEEE International Conference on Cluster Computing (IEEE Cluster), pages 127--134, 2002. Google ScholarDigital Library
- C.-H. Huang. Bio-Grid: A Collaborative Environment for Life-Science Research. In Proceedings of the 20-th International Symposium on Critical Care and Medicine, pages 123--132, 2005.Google Scholar
- C.-H. Huang. Bio-Grid: Bridging Life Science and Information Technology. In Proceedings of the 5-th IEEE/ACM Symposium on Cluster Computing and the Grid (BioGrid Workshop), CD-ROM, 2005. Google ScholarDigital Library
- C.-H. Huang and X. He. Communication-Efficient Bulk Synchronous Parallel Algorithm for Parentheses Matching. In Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, VA. unpaginated, 9 pages, 2001.Google Scholar
- C.-H. Huang and X. He. Finding Hamiltonian Paths in Tournaments on Clusters -- A Provably Communication-Efficient Approach. In Proceedings of the 16th ACM Symposium on Applied Computing, Las Vegas, pages 549--553, 2001. Google ScholarDigital Library
- C.-H. Huang and X. He. Parallel Range Searching in Large Databases Based on General Parallel Prefix Computation. In Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, VA. unpaginated, 3 pages, 2001.Google Scholar
- C.-H. Huang and S. Rajasekaran. High-Performance Parallel Biocomputing. Parallel Computing Journal, 30(9-10):999--1000, 2004. Google ScholarDigital Library
- C. Lee, A. Abdool, and C.-H. Huang. Pca-based population structure inference with generic clustering algorithms. BMC bioinformatics, 10(Suppl 1):S73, 2009.Google ScholarCross Ref
- C. Lee and C.-H. Huang. Searching for transcription factor binding sites in vector spaces. BMC bioinformatics, 13(1):215, 2012.Google ScholarCross Ref
- C. Lee and C.-H. Huang. Lasagna: A novel algorithm for transcription factor binding site alignment. BMC bioinformatics, 14(1):108, 2013.Google ScholarCross Ref
- C. Lee and C.-H. Huang. Lasagna-search 2.0: integrated transcription factor binding site search and visualization in a browser. Bioinformatics, page btu115, 2014.Google Scholar
- C. Lee, B. Nkounkou, and C.-H. Huang. Comparison of lda and sprt on clinical dataset classifications. Biomedical informatics insights, 4:1, 2011.Google Scholar
- C.-W. Lee and C.-H. Huang. Toward Cooperative Genomic Knowledge Inference. Parallel Computing Journal, 30(9-10):1127--1135, 2004. Google ScholarDigital Library
- C.-W. Lee, C.-H. Huang, and S. Rajasekaran. TROJAN: A Scalable Parallel Semantic Network System. In Proceedings of the 15th IEEE International Conference on Tools eith Artificial Intelligence, pages 219--223, 2003. Google ScholarDigital Library
- D. Lindberg, B. Humphreys, and A. McCray. The Unified Medical Language System. Methods Inf. Med., 32(4):281--291, 1993.Google ScholarCross Ref
- L. LoConte, S. Brenner, T. Hubbard, C. Chothia, and A. Murzin. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30:264--267, 2002.Google ScholarCross Ref
- N. Losseff, L. Wang, H. Lai, D. Yoo, M. Gawne-Caine, W. McDonald, D. Miller, and A. Thomas. Progressive Cerebral Atrophy in Multiple Sclerosis: A serial MRI study. Brain, 119(6):2009--2019, 1996.Google ScholarCross Ref
- H. M. Martinez. An Efficient Method for Finding Repeats in Molecular Sequences. Nucleic Acids Research 11(13), pages 4629--4634, 1983.Google Scholar
- A. McCray, S. Srinivasan, and A. Browne. Lexical Methods for Managing Variation in Biomedical Terminologies. In Proceedings Annual Symposium Compu. Appl. Med. Care, pages 235--239, 1994.Google Scholar
- B. Nkounkou, C. Lee, C.-H. Huang, and C. Brown. Biological data classifications with lda and sprt. In Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on, pages 164--168. IEEE, 2010.Google ScholarCross Ref
- W. Pearson. Using the FASTA program to search protein and DNA sequence databases. Methods Mol. Biol., 24:307--331, 1994.Google Scholar
- S. Quader, N. Snyder, K. Su, E. Mochan, and C.-H. Huang. Ml-consensus: a general consensus model for variable-length transcription factor binding sites. In Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, pages 25--36. Springer, 2011. Google ScholarDigital Library
- S. Rajasekaran, S. Balla, C.-H. Huang, V. Thapar, and M. Schiller. Exact Algorithms for Motif Search. Journal of Clinical Monitoring and Computing, 19(4).Google Scholar
- S. Rajasekaran and C.-H. Huang. A Randomized Algorithm for Distance Matrix Calculations in Multiple Sequence Alignment. In Proceedings of First Knowledge Explorration in Life Science Informatics (Kelsi), LNAI 3303, Springer-Verlag, pages 33--45, 2004.Google Scholar
- D. Sharma, S. Balla, S. Rajasekaran, and N. DiGirolamo. Degenerate primer selection algorithms. In Computational Intelligence in Bioinformatics and Computational Biology, 2009. CIBCB'09. IEEE Symposium on, pages 155--162. IEEE, 2009. Google ScholarDigital Library
- K. Stoffel, J. Hendler, J. Saltz, and B. Anderson. Parka on MIMD-Supercomputers. Technical Report CS-TR-3672, Computer Science Dept., UM Institute for Advanced Computer Studies, University of Maryland, College Park, 1996.Google Scholar
- M. Surdeanu, D. I. Moldovan, and S. M. Harabagiu. Performance Analysis of a Distributed Question/Answering System. IEEE Trans. on Parallel and Distributed Systems, 13(6):579--596, 2002. Google ScholarDigital Library
- R. L. Tatusov, Altschul, S. F., and E. V. Koonin. Detection of Conserved Segments in Proteins: Iterative Scanning of Sequence Databases with Alignment Block. In Proceedings of Natl. Acad. Sci., 91, pages 12091--12095, 1994.Google ScholarCross Ref
- N. T. L. Tran, L. DeLuccia, A. F. McDonald, and C.-H. Huang. Cross-disciplinary detection and analysis of network motifs. Bioinformatics and Biology insights, 9:49, 2015.Google ScholarCross Ref
- N. T. L. Tran, S. Mohan, Z. Xu, and C.-H. Huang. Current innovations and future challenges of network motif detection. Briefings in Bioinformatics, 16(3):497--525, 2015.Google ScholarCross Ref
- C. Wong, Y. Li, C. Lee, and C.-H. Huang. Ensemble learning algorithms for classification of mtdna into haplogroups. Briefings in bioinformatics, 12(1):1--9, 2011.Google ScholarCross Ref
- E. Wong, B. Baur, S. Quader, and C.-H. Huang. Biological network motif detection: principles and practice. Briefings in bioinformatics, 13(2):202--215, 2012.Google ScholarCross Ref
Index Terms
- REU site: bio-grid initiatives for interdisciplinary research and education
Recommendations
REU PACI---undergraduate student research experiences
SC '98: Proceedings of the 1998 ACM/IEEE conference on SupercomputingThe Education, Outreach, and Training Partnerships for Advanced Computational Infrastructure (EOT PACI) Program funded by the National Science Foundation, has launched a new Research Experiences for Undergraduates (REU) PACI Program. The REU PACI ...
Undergraduate Educational Pathways for Developing a High-Performance Computing Workforce
PEARC '17: Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and ImpactThe need for college graduates with high-performance computing (HPC) skills is rapidly increasing with the greater interest in tasks requiring big data processing. Major needs in medicine, geosciences, and data analytics, among other disciplines, drive ...
Comments