Computer Code

Developing AI coding assistants? Test code generation, completion, translation, bug detection, and repair capabilities.

6 tasks 8 datasets 10 results

Code Generation

Generating code from natural language descriptions (HumanEval, MBPP).

8 datasets 10 results
HumanEval HumanEval: Hand-Written Evaluation Set 2021
SOTA: 92.4 (pass@1)
o1-preview

164 hand-crafted Python programming problems with function signatures, docstrings, and unit tests. Standard benchmark for code generation.

MBPP Mostly Basic Python Problems 2021
SOTA: 89.2 (pass@1)
claude-35-sonnet

974 crowd-sourced Python programming problems suitable for beginners. Covers programming fundamentals and standard library.

HumanEval+ HumanEval+ Extended Version 2023

Extended HumanEval with 80x more test cases. Tests code robustness and edge case handling.

MBPP+ MBPP+ Extended Version 2023

Extended MBPP with additional test cases. Uses 399 hand-verified problems from MBPP-sanitized.

APPS Automated Programming Progress Standard 2021

10,000 coding problems from Codewars, AtCoder, Kattis, and CodeForces. Ranges from introductory to competition level.

CodeContests CodeContests Competitive Programming 2022

13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.

SWE-Bench SWE-bench: Software Engineering Benchmark 2023

2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.

SWE-Bench Verified SWE-bench Verified Subset 2024
SOTA: 49 (resolve-rate)
claude-35-sonnet

500 manually verified GitHub issues confirmed solvable by human engineers. High-quality subset of SWE-bench.

Code Completion

Predicting the next tokens in code sequences.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Code Translation

Converting code between programming languages.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Code Summarization

Generating natural language descriptions of code.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Bug Detection

Identifying bugs and vulnerabilities in code.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub

Program Repair

Automatically fixing bugs in code.

0 datasets 0 results
No datasets indexed yet. Contribute on GitHub