PVT v2: Improved baselines with pyramid vision transformer

Wenhai Wang, Enze Xie, Xiang Li et al.

2022 · Computational Visual Media · 2,215 citations

Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation. In particular, PVT v2 achieves comparable or better perform…

Read the paper →

Explore this paper's citation graph on Constellation.