Explaining AI through mechanistic interpretability
Lena Kästner, Barnaby Crook
2024 · European Journal for Philosophy of Science · 28 citations
Abstract Recent work in explainable artificial intelligence (XAI) attempts to render opaque AI systems understandable through a divide-and-conquer strategy. However, this fails to illuminate how trained AI systems work as a whole . Precisely this kind of functional understanding is needed, though, to satisfy important societal desiderata such as safety. To remedy this situation, we argue, AI researchers should seek mechanistic interpretability , viz. apply coordinated discovery strategies familiar from the life sciences to uncover the functional organisation of complex AI systems. Additionall…
Explore this paper's citation graph on Constellation.