Project Description

Students will construct a computational model for evaluating and assessing the re-identification risk of large learning analytics datasets containing student academic outcomes and demographic information. Using synthesized data similar to those collected by learning management systems (such as Canvas) and other university systems, students will evaluate threat models for data release scenarios. This is a continuation of a 2019-2020 REUW project.

Technology or Computational Component

Using the R statistical analysis package students will construct and manipulate a "synthetic" dataset that models a population of students at a major university. They will then use various assumptions about the information contained in this dataset to statistically evaluate the chance of reidentification based on information contained in the data and available via third parties. Students will be directly involved throughout the development of this model.