The course introduces advances in interpretability in machine learning, ranging from inherently interpretable models, all the way to posthoc explanations (e.g., feature attributions, counterfactual explanations, mechanistic interpretability, etc.). The course will also discuss connections between interpretability and robustness, privacy, causality, and trustworthiness, as well as, emergent research challenges in interpretability and trustworthiness of large generative models.
ENEE436 or a comparable introductory machine learning course.