A complete information to the Imaginative and prescient Transformer (ViT) that revolutionized pc imaginative and prescient
Hello everybody! For these of you who do not know me but, my identify is François and I am a Analysis Scientist at Meta. I am keen about explaining superior AI ideas and making them extra accessible.
Right now it’s thought-about one of the crucial vital contributions to the sector of pc imaginative and prescient. Imaginative and prescient Transformer (ViT).
Imaginative and prescient Transformer was revealed in a paper by Alexey Dosovitskiy et al. (Google Mind) in 2021. An image is equivalent to 16×16 wordsOn the time, Transformers had been proven to be key to reaching good efficiency in NLP duties and had been featured in important papers. All it takes is attention 2017.
From 2017 to 2021, there have been a number of makes an attempt to combine consideration mechanisms into convolutional neural networks (CNNs). Nevertheless, most of those had been hybrid fashions (combining CNN layers with consideration layers) that lacked scalability. Google addressed this challenge by eliminating convolutions altogether and leveraging computational energy to scale the mannequin.

