Restriction: Must be in the Computer Science Master's or Doctoral programs, or permission of instructor.
Focuses on recent developments in training, aligning, and evaluating long-context language models, which have allowed cutting-edge LLMs to process and generate millions of words. Topics include neural architectures (e.g., Transformers, Mamba), extended context fine-tuning/upscaling, and tasks such as summarization and QA over books.