BiFormer: Vision Transformer with Bi-Level Routing Attention

Lei Zhu, Xinjiang Wang, Zhanghan Ke et al.

2023 · 1,037 citations

As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via…

Read the paper →

Explore this paper's citation graph on Constellation.