Agentic AI
Autonomous Coding
Extended coding tasks without human intervention.
0 datasets0 results
Autonomous Coding is a key task in agentic ai. Below you will find the standard benchmarks used to evaluate models, along with current state-of-the-art results.
Benchmarks & SOTA
No datasets indexed for this task yet.
Contribute on GitHubRelated Tasks
Time Horizon
How long an AI agent can work autonomously before failing (METR).
HCAST
Human-Calibrated Autonomy Software Tasks - 90 tasks across cybersecurity, AI R&D, and engineering.
RE-Bench
Research Engineering tasks requiring experimentation and implementation.
SWE-bench
Resolving real GitHub issues autonomously.