DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models

Published in

SyncedReview

3 min readApr 20, 2024

In the expansive realm of artificial intelligence and natural language processing, Small Language Models (SLMs) are making significant strides. Unlike their larger counterparts with hefty parameter counts and demanding computational needs, SLMs are sleeker versions crafted for optimal performance even in resource-constrained settings.

In a new paper RecurrentGemma: Moving Past Transformers for Efficient Open Language Models, a Google DeepMind research team introduce RecurrentGemma, an open language model built on Google’s innovative Griffin architecture. This model reduces memory usage and facilitates efficient inference on lengthy sequences, thereby unlocking new possibilities for highly efficient small language models in environments where resources are limited.

Griffin, proposed by Google in February 2024, is a hybrid model that achieves rapid inference when generating long sequences by replacing global attention with a blend of local attention and linear recurrences. The researchers introduce just one modification to the Griffin architecture, multiplying the input embeddings by a constant equal to the square root of the model’s width.

The RecurrentGemma architecture moves away from global attention, instead, it models the sequence through a combination of linear recurrences and local attention. The team pre-trains RecurrentGemma-2B on 2 trillion tokens. They begin by training on a diverse mix of large-scale general data before refining training on a smaller, higher quality dataset. For fine-tuning, they adopt a similar strategy to Gemma, incorporating a novel RLHF algorithm to optimize the model for generating responses with high reward.

The evaluation of RecurrentGemma-2B spans various domains, employing a blend of automated benchmarks and human assessments. Notably, RecurrentGemma-2B matches Gemma’s performance while achieving superior throughput during inference, particularly on extended sequences.

The code is available on project’s GitHub. The paper RecurrentGemma: Moving Past Transformers for Efficient Open Language Models is on arXiv.

Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models

Written by Synced