Project Description

The National Center for Genome Analysis Support (NCGAS) at Indiana University provides support to the national biological research community with genomic analysis. NCGAS has previously developed a pipeline to assemble and annotate transcriptome (messenger RNA) data collected from one organism. The student will work on testing and making the necessary changes to this pipeline to assemble and annotate metatranscriptome datasets. Metatranscriptome datasets contain genetic information of an entire microbial community, which includes multiple microbial species. The goal for this project is to extend the previously developed pipeline to these new datasets. Overall, as done previously, the resulting research will be provided back to the research community with documentation to help them with their analysis. Through this entire process the NCGAS team will be supporting and helping the student-learning bioinformatics and computation at this level is highly interactive.

Technology or Computational Component

Students will obtain research experience in handling large amounts of biological data in a High Performance Clusters (HPC) setting, since most of the analysis steps require HPC resources. NCGAS is sited in the Cyberinfrastructure Building (CIB) along with the other HPC teams. In addition, they will become comfortable working in a LINUX environment and running bash commands to process/format datasets. Depending on the student's interest, they will have the opportunity to learn R and Python.