Hide Advanced Options
Courses - Fall 2026
CMSC
Computer Science Department Site
Open Seats as of
04/04/2026 at 10:30 PM
CMSC848R
Selected Topics in Information Processing; Language Model Interpretability
Credits: 3
Grad Meth: Reg
Restriction: Must be in the Computer Science Master's or Doctoral program; or permission of instructor.

This course focuses on state-of-the-art methods for interpreting neural language models and understanding their learned behaviors. We will discuss approaches centered on both understanding models internal mechanisms/representations and attributing behaviors back to the training data. We will focus on understanding model behaviors including hallucination, factuality, memorization, and explanation/reasoning elicitation. If time allows, we will discuss recent developments in ameliorating learned behaviors, such as model editing, unlearning, and steering. This is primarily a seminar course focused on paper readings and presentations.