A Formal Framework for Understanding Length Generalization in Transformers

Publication
arXiv preprint