CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Xiaoyi Dong, Jianmin Bao, Dongdong Chen et al.

2022 · 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) · 1,219 citations

We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute whereas local self-attention often limits the field of interactions of each token. To address this issue, we develop the Cross-Shaped Window self-attention mechanism for computing self-attention in the horizontal and vertical stripes in parallel that form a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width. We provide a ma…

Read the paper →

Explore this paper's citation graph on Constellation.