Explaining AI through mechanistic interpretability

Lena Kästner, Barnaby Crook

2024 · European Journal for Philosophy of Science · 35 citations

Abstract Recent work in explainable artificial intelligence (XAI) attempts to render opaque AI systems understandable through a divide-and-conquer strategy. However, this fails to illuminate how trained AI systems work as a whole . Precisely this kind of functional understanding is needed, though, to satisfy important societal desiderata such as safety. To remedy this situation, we argue, AI researchers should seek mechanistic interpretability , viz. apply coordinated discovery strategies familiar from the life sciences to uncover the functional organisation of complex AI systems. Additionall…

Read the paper →

Explore this paper's citation graph on Constellation.