Projects

Project Description

Large Language Models (LLMs), such as GPT, have demonstrated impressive performance and utility in a wide range of applications. In this project, we will explore what affects the confidence in the outputs of LLMs: are they really confident, or do they just often sound confident because that is what users prefer? We will explore the impact of prompt structure on confidence. We will also compare the model's internal confidence (by examining token probabilities) with the confidence the model outputs in its response. We will explore different approaches to align internal and external confidence. We will compare confidence in the model and human response and try to explain possible discrepancies.

Technology or Computational Component

The project will include working with LLMs both through the user interface and through an API. Ability to program in Python will be useful for using the API and for analyzing results.