Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Offline RLHome/Tasks/Reinforcement Learning/Offline RL

Offline RL.

Offline RL — learning policies from fixed datasets without further environment interaction — matters because most real-world domains (healthcare, robotics, autonomous driving) can't afford online exploration. CQL (2020) and IQL (2022) established strong baselines on the D4RL benchmark, but the field was disrupted by Decision Transformer (2021), which recast RL as sequence modeling. The latest wave uses pretrained language models as policy backbones, blurring the line between offline RL and in-context learning, with benchmarks like CORL tracking reproducibility across dozens of algorithms.

1
Datasets
0
Results
normalized_return
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

D4RL HalfCheetah-Medium-v2

Canonical offline RL benchmark environment from D4RL. The halfcheetah-medium-v2 dataset contains 1M transitions collected from a medium-level SAC policy. Scores are reported as normalized return where 0 = random policy and 100 = expert SAC policy.

Primary metric: normalized_return
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on D4RL HalfCheetah-Medium-v2.

No results yet. Be the first to contribute.

What were you looking for on Offline RL?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

D4RL HalfCheetah-Medium-v2
CANONICAL
0 results · normalized_return
§ 05 · Related tasks

Other tasks in Reinforcement Learning.

Atari GamesContinuous Control
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Offline RL? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.