Identifying similar datasets from the Sequence Read Archive (SRA) to help with genomic analysis (Mentor Bhavya Papudeshi)
Informatics Major (School of Informatics, Computing, & Engineering)
Biology Major (College of Arts & Sciences)
Thomas Doak (University Information Technology Services (UITS))
The National Center for Genome Analysis Support (NCGAS) at Indiana University provides support to the national biological research community with genomic analysis. The team works on many genomic projects, ranging from microbes to plants and animals, and benchmarking genomic software. The student will work on one of these genome projects under the team's guidance. The student will run the readily available pipeline developed by last year's CEW&T REU students to search the database for similar datasets and other pipelines to assemble and annotate these datasets to understand biological functions. The goal for this project is to extend the previously developed pipeline to include additional steps after identifying datasets of interest. Overall the project will remain flexible; the student will be able to decide the biological data they would like to work with, and research questions they would like to focus on. Overall, as done previously, the resulting research will be provided back to the research community with documentation to help them with their analysis. Through this entire process the NCGAS team will be supporting and helping the student-learning bioinformatics and computation at this level is highly interactive.
Technology or Computational Component
Students will obtain research experience in handling large amounts of biological data in a High Performance Clusters (HPC) setting, since most of the analysis steps require HPC resources. NCGAS is sited in the Cyberinfrastructure Building (CIB) along with the other HPC teams. In addition, they will become comfortable working in a LINUX environment and running bash commands to process/format datasets. Depending on the student's interest, they will have the opportunity to learn R and Python.