LLM have amazing capabilities, but also hallucinate and make reasoning mistakes. Can we understand these abilities and limitations theoretically, as way to figure out ways of overcoming them? The incredible scale of current LLMs makes this a daunting prospect. However, recent research has developed mathematical understanding sheeding light on questions such as:
Why do LLMs struggle with tasks (e.g., multiplying 6-digit numbers) that a simple calculator can perform easily?
Which problems can be solved by a transformer in one step? Which require a chain-of-thought? How long does a chain-of-thought have to be?
What are benefits and limits of allowing LLM agents to act in the world, and to collaborate with each other or humans?
Which kinds of problems is a transformer likely to generalize on well?
What are differences between transformers and other architectures (e.g. state-space models)?
What possible future architectures – if any – could replace transformers as the backbone of LLMs?
Assuming that AGI can be realized by scaling current LLM architectures, what kind of abilities could we expect from this AGI?
Research on these questions draws on fields such as computational complexity, formal language theory, and statistical learning theory.
This seminar will not presuppose knowledge of any of these fields. However, a willingness to engage with technical content is important.
We will discuss recent research papers, both from our own group and from other groups. A lot of this content is highly technical, and it is absolutely fine if you do not understand everything in the paper you’ll present. Our focus will be on discussing high-level take-aways for understanding the (in)abilities of current AI systems, and prospects for future developments.
TBD. A few relevant papers. We will expand this.
You are very welcome to propose readings that you are interested in, as long as they are related to the topic of the seminar!
The path to AGI:
Scaling laws and emergence:
Formal languages as a tool for understanding transformers’ limitations:
Limitations affecting LLM performance:
Chain of Thought:
Learning and Generalization:
SSMs as a competing architecture:
k-hop reasoning: