An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen, Saining Xie, Kaiming He
2021 · 2021 IEEE/CVF International Conference on Computer Vision (ICCV) · 1,439 citations
This paper does not describe a novel method. Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: self-supervised learning for Vision Transformers (ViT). While the training recipes for standard convolutional networks have been highly mature and robust, the recipes for ViT are yet to be built, especially in the self-supervised scenarios where training becomes more challenging. In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT. We observe that insta…
Explore this paper's citation graph on Constellation.