Time Horizon
Time horizon — how long an AI agent can work autonomously before requiring human correction — is arguably the single most important meta-metric for agentic AI. METR's evaluations suggest current frontier agents degrade significantly after 30-60 minutes of autonomous operation, while human software engineers can sustain productive work for hours. The metric matters because economic value scales exponentially with reliable autonomy duration: an agent that works reliably for 8 hours is not 16x more valuable than one that works for 30 minutes — it's qualitatively different, enabling entirely new categories of delegatable work.
METR Time Horizon
Measures the length of tasks AI agents can reliably complete autonomously. Task horizon is the 50th-percentile task length at 50% success. Higher = agent can handle longer multi-step tasks without human intervention.
Top 10
Leading models on METR Time Horizon.
No results yet. Be the first to contribute.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Agentic AI.