While contemporary deep learning models continue to achieve outstanding results across a wide range of tasks, these models are known to have huge data appetites. The emergence of large-scale pretrained language models such as Open AI’s GPT-3 has helped reduce the need for task-specific labelled data in natural language processing (NLP), as the models’ learned contextualized text representations can be fine-tuned for specific downstream tasks using relatively small training data sizes. These powerful large language models have more recently also shown their ability to generate answers for unseen NLP tasks via few-shot inference.

Motivated by this development, a new Google…


Large-scale pretrained language models have achieved state-of-the-art results on many natural language processing (NLP) benchmarks, but these data-hungry models tend to struggle in few-shot learning settings, where only limited training data is available.

To address this issue, a team from the University of Massachusetts Amherst and Google Research has proposed Self-Training with Task Augmentation (STraTA), a novel approach that combines task augmentation and self-training to leverage unlabelled data and improve sample efficiency and model performance on NLP tasks.


Large language models (LMs) have become much larger and more powerful in recent years, achieving remarkable results across natural language processing (NLP) tasks such as text generation, translation, question answering and more. But the malicious use of these trillion-parameter models also poses critical societal threats, particularly through potential biases and the generation of “toxic” content such as insults, threats and hate speech.

In the paper Detoxifying Language Models, a DeepMind research team critically discusses toxicity evaluation and mitigation methods for contemporary transformer-based English LMs and provides insights toward safer model use and deployment.


AI research and development in recent years has shown that deep neural networks can achieve extremely impressive performance, but at the cost of often enormous computation burdens. For instance, training Open AI’s GPT-3, which has 175 billion parameters, requires access to huge server clusters with strong graphics cards, entailing costs that can soar to the millions of dollars.

Two popular approaches designed to alleviate this issue are neural network pruning and distillation. The former aims to reduce the produced neural network size while maintaining similar accuracy, and the latter is used to speed up inference. Existing network pruning techniques however…


Learning how to learn is something most humans do well, by leveraging previous experiences to inform the learning processes for new tasks. Endowing AI systems with such abilities however remains challenging, as it requires the machine learners to learn update rules, which typically have been manually tuned for each task.

The field of meta-learning studies how to enable machine learners to learn how to learn, and is a critical research area for improving the efficiency of AI agents. …


Sequence to sequence modelling (seq2seq) with neural networks has become the de facto standard for sequence prediction tasks such as those found in language modelling and machine translation. The basic idea is to use an encoder to transform the input sequence into a context vector; then use a decoder to extract an output sequence that predicts the next value from that vector.

Despite their power and impressive achievements, seq2seq models are often sample-inefficient. Also, due to their relatively weak inductive biases, these models can fail spectacularly on benchmarks designed to test for compositional generalization.

The new MIT CSAIL paper Sequence-to-Sequence…


Representation learning is used to summarize essential features of high-dimensional data and turn them into lower-dimensional representations with desirable properties. A popular method for this is the heuristic approach, which fits a neural network that maps from the high dimensional data to a set of labels, taking the top layer of the neural network as the representation of the inputs.

However, such heuristic approaches often end up capturing spurious features that do not transfer well; or finding entangled dimensions that are uninterpretable. …


Can too much information hinder an AI model? Take a vehicle’s lane-keeping feature, whose input is a high-resolution camera delivering millions of bits of real-time information. The model needs but a fraction of this data related to vehicle orientation to function robustly. The consideration of additional data increases the compute burden, the risk of overfitting, and the danger of exposure to adversarial attacks.

Most of the challenges faced by today’s reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency, are highly correlated with compression — the minimizing of information by filtering out irrelevant data. …


The increasingly impressive performance of deep neural networks (DNNs) in recent years has come at the cost of increasingly high computation burdens. As such, the design of efficient and even optimal architectures is vital for continued DNN development and deployment.

To advance research in this area, a Purdue University team has introduced a self-adaptive algorithm for optimal DNN design. Their adaptive network enhancement (ANE) method learns not only from given information but also from the current computer simulation.


Today more than ever, people are voicing concerns regarding biases in news media. Especially in the political arena, there are accusations of favouritism or disfavour in reporting, often expressed through the emphasizing or ignoring of certain political actors, policies, events, or topics. Many regard this as a corruption of the fourth estate and a rising threat to democracy.

Is it possible to develop objective and transparent data-driven methods to identify such biases, rather than relying on subjective human judgements? MIT researchers Samantha D’Alonzo and Max Tegmark say “yes,” and have proposed an automated method for measuring media bias.

In the…

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store