Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

Vedant Palit, Rohan Pandey, Aryaman Arora et al.

2023 · 9 citations

Mechanistic interpretability seeks to understand the neural mechanisms that enable specific behaviors in Large Language Models (LLMs) by leveraging causality-based methods. While these approaches have identified neural circuits that copy spans of text, capture factual knowledge, and more, they remain unusable for multimodal models since adapting these tools to the vision-language domain requires considerable architectural changes. In this work, we adapt a unimodal causal tracing tool to BLIP to enable the study of the neural mechanisms underlying image-conditioned text generation. We demonstr…

Read the paper →

Explore this paper's citation graph on Constellation.