Building and analyzing digital library collections for text data mining
Lily Stella
Undergraduate Researcher
Linguistics Major (College of Arts & Sciences)
John Walsh
Faculty Mentor
John Walsh (Luddy School of Informatics, Computing and Engineering)
Project Description
Students will work with me to create, document, and analyze worksets (digital collections) from the 17.5 million volumes in the HathiTrust digital library (https://www.hathitrust.org/). Collections will be based on specific topics and themes, such as children's literature or French poetry. Collections will be analyzed with computational tools supported by the HathiTrust Research Center (https://analytics.hathitrust.org/).
Technology or Computational Component
Students will become familiar with the interfaces and infrastructure of the HathiTrust Digital Library and the HathiTrust Research Center and with library metadata. They will also acquire general research and libraries skills as we consult bibliographies and other reference resources to inform our collection-building. We will use github.com and the simple Markdown language to document the collections. We will implement existing tools and libraries in languages such as Python and R to analyze the collections.