What is Google's new internal RL technique?

Google's internal RL technique guides an AI model's internal activations to develop step-by-step reasoning solutions, improving performance on complex tasks.

How does internal RL differ from next-token prediction?

Internal RL focuses on steering the AI's internal states towards abstract goals, unlike next-token prediction which generates output one word at a time.

What are the potential applications of Google's internal RL?

Internal RL could enable more capable autonomous agents for complex reasoning tasks, real-world robotics, and multi-modal AI without constant human guidance.

Home / Technology / Google AI Learns Smarter Reasoning

Google AI Learns Smarter Reasoning

17 Jan

•

Summary

New AI technique steers internal activations for reasoning.
Internal RL bypasses token-by-token prediction limits.
This could enable autonomous agents for complex tasks.

Researchers at Google have introduced a new method called internal reinforcement learning (internal RL) to improve AI's ability to handle complex reasoning tasks. This technique steers the AI model's internal activations, guiding it towards developing high-level, step-by-step solutions rather than relying on traditional next-token prediction. This approach aims to overcome the limitations of autoregressive models, which struggle with long-horizon planning and sparse rewards.

The internal RL method utilizes an "internal neural network controller" that modifies the model's internal activations. This controller learns high-level actions through unsupervised, self-supervised learning by analyzing sequences of behavior and inferring the underlying intent. The researchers found that applying this controller to a frozen pre-trained model was more effective, enabling it to discover key subgoals without human labels.

Experiments demonstrated that internal RL significantly outperforms traditional methods like GRPO on complex tasks with sparse rewards. This advancement could lead to the development of autonomous agents capable of handling intricate reasoning and real-world robotics, potentially offering a more efficient path to advanced AI capabilities.