Figure unveils first-of-its-kind brain for humanoid robots after shunning OpenAI
Helix introduces a novel approach to upper-body manipulation control.
Updated: Feb 20, 2025 01:46 PM EST
Kapil Kajal
In a significant move in the AI world, California-based Figure has revealed Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics.
Brett Adcock, founder of Figure, said that Helix is the most significant AI update in the company’s history.
“Helix thinks like a human… and to bring robots into homes, we need a step change in capabilities. Helix can generalize to virtually any household item,” Adcock said in a social media post.
“We’ve been working on this project for over a year, aiming to solve general robotics. Like a human, Helix understands speech, reasons through problems, and can grasp any object – all without needing training or code. In testing, Helix can grab almost any household object,” he added.
The launch of Helix follows Figure’s announcement of its separation from OpenAI in early February.
Adcock stated at that time, “Figure has achieved a significant breakthrough in fully end-to-end robot AI, developed entirely in-house. We are excited to reveal something that no one has ever seen before in a humanoid within the next 30 days.”
A series of the world’s first capabilities
According to Figure, Helix introduces a novel approach to upper-body manipulation control.
It offers high-rate continuous control of the entire humanoid upper body, which includes the wrists, torso, head, and individual fingers.
This level of control allows for more nuanced movements and interactions. Another important aspect of Helix is its capability for multi-robot collaboration.
It can operate simultaneously on two robots, enabling them to work together on shared, long-term manipulation tasks involving objects they have not encountered before.
This feature significantly broadens the operational scope of robotics in complex environments.
Additionally, robots equipped with Helix can pick up a wide range of small household items, including many they have yet to encounter.
This ability is facilitated through natural language prompts, enhancing the ease of interaction and usability.
Helix also employs a distinctive approach by utilizing a single set of neural network weights to learn various behaviors, such as picking and placing items, using drawers and refrigerators, and enabling cross-robot interaction.
This eliminates the need for task-specific fine-tuning, streamlining the learning process.
Lastly, Helix operates entirely on embedded low-power GPUs, which makes it suitable for commercial deployment. This feature highlights its practicality for real-world applications.
Robots and Helix integration
According to Figure, current robotic systems struggle to adapt quickly to new tasks, often requiring extensive programming or numerous demonstrations.
To address this, the Figure used the capabilities of Vision Language Models (VLMs) to enable robots to generalize their behaviors on demand and perform tasks through natural language instructions.
The solution presented is Helix, the model designed for controlling the entire humanoid upper body with high dexterity and speed.
Helix comprises System 1 (S1) and System 2 (S2). S2 is a slower, internet-pre-trained VLM that focuses on scene understanding and language comprehension.
At the same time, S1 is a fast visuomotor policy that
converts the information from S2 into real-time robot actions. This division allows each system to operate optimally—S2 for thoughtful processing and S1 for quick execution.
“Helix addresses several issues previous robotic approaches faced, including balancing speed and generalization, scalability to manage high-dimensional actions, and architectural simplicity using standard models,” according to Figure.
Additionally, separating S1 and S2 enables independent improvements to each system without reliance on a shared observation or action space.
A dataset of around 500 hours of teleoperated behaviors was collected to train Helix, utilizing an auto-labeling VLM to generate natural language instructions.
The architecture involves a 7B-parameter VLM and an 80M parameter transformer for control, processing visual inputs to enable responsive control based on the latent representations generated by the VLM.
Figure launches Helix, an AI that allows robots to recognize language and reasoning like humans and grasp household objects without training.
interestingengineering.com