This is an updated version.
When it comes to large language models, it turns out that even 1.5 billion parameters is not large enough. While that was the size of the GPT-2 transformer-based language model that OpenAI released to much fanfare last year, today the San Francisco-based AI company outdid itself, announcing the upgraded GPT-3 with a whopping 175 billion parameters.
GPT-3 adopts and scales up the GPT-2 model architecture — including modified initialization, pre-normalization, and reversible tokenization — and shows strong performance on many NLP tasks and benchmarks in zero-shot, one-shot, and few-shot settings.
The OpenAI researchers say the GPT-3 in some cases approaches the performance of SOTA fine-tuned systems, can generate high-quality samples, and shows strong qualitative performance at tasks defined on-the-fly.
Recent research has demonstrated substantial gains on many NLP tasks and benchmarks through an approach that uses pretraining on a large corpus of text followed by fine-tuning on a specific task. But current AI systems still largely struggle to perform a new language task from only a few examples or from simple natural language instructions describing the tasks.
The researchers show through GPT-3 training that scaling up language models can greatly improve task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior SOTA approaches. GPT-3 can be applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model.
The researchers evaluated GPT-3 on over two dozen NLP datasets and conducted several novel experiments designed to test rapid adaptation to tasks unlikely to be directly contained in the training set. All evaluations were done under three settings: few-shot learning, one-shot learning, and zero-shot learning.
GPT-3 showed strong performance across many NLP datasets on translation, question-answering, and cloze tasks. It also did well on tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. The new model even generated samples of news articles that human evaluators had difficulty distinguishing from human-written texts.
The researchers trained a series of smaller models — ranging from 125 million parameters to 13 billion parameters — to compare their performance against GPT-3 on the three settings. For most tasks, they found relatively smooth scaling with model capacity in all three settings. They also noticed a pattern wherein the gap between zero-shot, one-shot, and few-shot performance often grows with model capacity, which they believe suggests larger models are more proficient meta-learners.
Although the findings show that even at the scale of the full GPT-3, models still struggle to perform few-shot learning on some tasks, the researchers believe very large language models like GPT-3 will become an important ingredient in the development of adaptable, general language systems.
In June, OpenAI released an API it developed for accessing new AI models, enabling users to try it on virtually any English language task through a general-purpose “text in, text out” interface. Designed to be both simple for anyone to use and flexible enough to make machine learning teams more productive, the API runs models with weights from the GPT-3 family with many speed and throughput improvements.
Journalist: Yuan Yuan | Editor: Michael Sarazen
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.
Share Your Research With Synced Review
Share My Research is Synced’s new column that welcomes scholars to share their own research breakthroughs with over 1.5M global AI enthusiasts. Beyond technological advances, Share My Research also calls for interesting stories behind the research and exciting research ideas. Share your research with us by clicking here.
Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!
2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.