InversionView: A General-Purpose Method for Reading Information from Neural Activations

Publication
Mechanistic Interpretability Workshop 2024 at ICML 2024 (oral) Awarded Second Place Prize