lacoco-lab

Theory of Machine Learning for Language Models

Content

This lecture treats modern theory of machine learning (e.g., generalization bounds). We will introduce techniques that are useful across many model families and data domains; we will especially illustrate the methods with applications to the understanding the learning of Language Models/LLMs.

Content at a glance (tentative):

Foundations
- Probability theory background
- Asymptotics, Empirical Risk Minimization
Classical and Modern Methods
- Concentration Inequalities
- Generalization Bounds via Uniform Convergence
- Rademacher Complexity, Covering Numbers
- VC Dimension
- PAC-Bayes
Applications to LLM/Transformers
- Learning results for LLMs/Transformers via Covering Numbers
- Hardness of learning for Transformers via Sensitivity
- Length generalization and limit transformers

Prerequisites:

Machine learning
Calculus and linear algebra
Probability theory
Motivation to work through mathematical content

To get an idea if your prerequisites are sufficient, you can check the early chapters in the notes linked under “Material” and see how accessible they look to you.

Logistics

This lecture is offered in Summer Semester 2026.

Data: (See LSF for authoritative information):

When: Monday 14:15-15:45; Thursday 12:15-13:45
Where: Building B3.2, HS 0.03
Lecturers: Michael Hahn and Yash Sarrof

Please register

on CMS to participate in the class
on LSF to be eligible for receiving credit

Material

Our primary text will be Tengyu Ma’s lecture notes from CS229M at Stanford (see also this iteration). We will treat a selection of key topics from these notes.

We’ll also take a view towards Transformers/Language Models, based on further readings (e.g., Edelman et al 2022, Wei et al 2021, Hahn and Rofin 2024, Huang et al 2025).

Syllabus

TBD

The syllabus will evolve over the course of the semester. We’ll adjust selection of topics based on what works best.

Grading

TBD