Separations in the Representational Capabilities of Transformers and Recurrent Architectures

Publication
arXiv Preprint