87% ImageNet Accuracy, 3.8ms Latency: Google’s MobileNetV4 Redefines On-Device Mobile Vision

Published in

SyncedReview

3 min readApr 19, 2024

Efficient on-device neural networks offer rapid, real-time, and interactive experiences while safeguarding private data from public internet exposure. Yet, the computational limitations of mobile devices present a formidable challenge in maintaining a delicate balance between accuracy and efficiency.

Addressing this challenge head-on, a recent paper titled “MobileNetV4 — Universal Models for the Mobile Ecosystem,” penned by a Google research team, unveils the latest iteration of MobileNets: MobileNetV4 (MNv4). This cutting-edge model boasts an impressive 87% ImageNet-1K accuracy, coupled with an astonishingly low Pixel 8 EdgeTPU runtime of merely 3.8ms.

At the heart of this breakthrough lies the Universal Inverted Bottleneck (UIB) and Mobile MQA, two revolutionary building blocks seamlessly integrated through a refined NAS recipe to forge a series of universally efficient mobile models.

The UIB block serves as an adaptable cornerstone for efficient network design, possessing the versatility to conform to various optimization objectives without inflating search complexity. Leveraging successful MobileNet components such as separable depthwise convolution (DW) and pointwise (PW) expansion and projection inverted bottleneck structures, the UIB facilitates a flexible Inverted Bottleneck (IB) structure during neural architecture search (NAS), obviating the need for manually crafted scaling rules. Moreover, in conjunction with the SuperNet-based Network Architecture Search algorithm, this approach facilitates parameter sharing (>95%) across diverse instantiations, rendering NAS remarkably efficient.

Complementing the UIB, the Mobile MQA introduces a groundbreaking attention block tailored for accelerators, yielding a notable 39% inference speedup. Furthermore, an optimized neural architecture search (NAS) recipe is introduced, enhancing MNv4 search effectiveness. The fusion of UIB, Mobile MQA, and the refined NAS recipe gives rise to a novel suite of MNv4 models, largely Pareto optimal across mobile CPUs, DSPs, GPUs, and specialized accelerators like the Apple Neural Engine and Google Pixel EdgeTPU.

In empirical assessments, MNv4 achieves 87% ImageNet-1K accuracy at a latency of 3.8ms on Pixel 8 EdgeTPU, marking a significant stride in mobile computer vision capabilities. The research team anticipates that their pioneering contributions and analytical framework will catalyze further advancements in mobile computer vision.

The paper MobileNetV4 — Universal Models for the Mobile Ecosystem is on arXiv.

Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

87% ImageNet Accuracy, 3.8ms Latency: Google’s MobileNetV4 Redefines On-Device Mobile Vision

Written by Synced