Hide Advanced Options
Courses - Fall 2025
CMSC
Computer Science Department Site
Open Seats as of
06/21/2025 at 01:30 PM
CMSC848R
Selected Topics in Information Processing; Language Model Interpretability
Credits: 3
Grad Meth: Reg
The course focuses on state-of-the-art methods for interpreting language language models and understanding their learned behaviors. We will discuss approaches centered on both understanding models internal mechanisms/representations and attributing behaviors back to thee training data. We will focus on model tendencies including hallucination, factuality, memorization, and explanation/reasoning elicitation. If time allows, we will discuss recent developments in ameliorating learned behaviors, such as model editing, unlearning, and steering.