Meta AI’s Sparse All-MLP Model Doubles Training Efficiency Compared to Transformers
Transformer architectures have established the state-of-the-art on natural language processing (NLP) and many computer vision tasks, and recent research has shown that All-MLP (multi-layer perceptron) architectures also have strong potential in these areas. However, although newly proposed MLP models such as gMLP (Liu et al., 2021a) can match…