Offline RL
Offline RL — learning policies from fixed datasets without further environment interaction — matters because most real-world domains (healthcare, robotics, autonomous driving) can't afford online exploration. CQL (2020) and IQL (2022) established strong baselines on the D4RL benchmark, but the field was disrupted by Decision Transformer (2021), which recast RL as sequence modeling. The latest wave uses pretrained language models as policy backbones, blurring the line between offline RL and in-context learning, with benchmarks like CORL tracking reproducibility across dozens of algorithms.
D4RL HalfCheetah-Medium-v2
Canonical offline RL benchmark environment from D4RL. The halfcheetah-medium-v2 dataset contains 1M transitions collected from a medium-level SAC policy. Scores are reported as normalized return where 0 = random policy and 100 = expert SAC policy.
Top 10
Leading models on D4RL HalfCheetah-Medium-v2.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Reinforcement Learning.