An Image is Worth 16x16 Words
An Image is Worth 16x16 words: Transformers for Image Recognition at Scale is a paper under review for ICLR 2020 that’s been making the rounds on Twitter. I found Yannick Kilcher’s explainer video — which starts with a lovely rant about “double-blind” peer review — a good introduction to the model, which could be the start of Transformers overtaking convolutional models at the very largest scales of computer vision models.