Robotics
End-to-end robotics — learning perception, planning, and control in a single model — entered a new era with vision-language-action (VLA) models. Google's RT-2 (2023) showed that a web-pretrained VLM could directly output robot actions, and the open-source Open X-Embodiment dataset (2023) unified data from 22 robot types across 21 institutions. The key tension is generalization: lab demos on specific robots are plentiful, but a single policy that transfers across embodiments, tasks, and environments remains the holy grail, with π₀ (Physical Intelligence, 2024) and Google's RT-X pushing this frontier.
RLBench
Large-scale robot learning benchmark with 100 manipulation tasks
Top 10
Leading models on RLBench.
All datasets
2 datasets tracked for this task.
Related tasks
Other tasks in Robots.
Looking to run a model? HuggingFace hosts inference for this task type.