Project Description

Students will assist in constructing a computational model for evaluating and assessing the re-identification risk of large learning analytics datasets containing data on student academic outcomes and demographic information.

Technology or Computational Component

Using a statistical analysis package (Python or R) students will be shown how to construct a "synthetic" dataset that accurately models a population of students at a major university. They will then use various assumptions about the information contained in this dataset to statistically evaluate the chance of reidentification based on information contained in the data and available via third parties. Students will be directly involved throughout the development of this model.