Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

Publication
arXiv preprint