Computer Code
Developing AI coding assistants? Test code generation, completion, translation, bug detection, and repair capabilities.
Code Generation
Generating code from natural language descriptions (HumanEval, MBPP).
164 hand-crafted Python programming problems with function signatures, docstrings, and unit tests. Standard benchmark for code generation.
974 crowd-sourced Python programming problems suitable for beginners. Covers programming fundamentals and standard library.
Extended HumanEval with 80x more test cases. Tests code robustness and edge case handling.
Extended MBPP with additional test cases. Uses 399 hand-verified problems from MBPP-sanitized.
10,000 coding problems from Codewars, AtCoder, Kattis, and CodeForces. Ranges from introductory to competition level.
13,610 competitive programming problems from CodeForces. ~200 private test cases per problem. 12+ programming languages.
2,294 real GitHub issues from popular Python repositories. Tests ability to resolve real-world software engineering tasks.
500 manually verified GitHub issues confirmed solvable by human engineers. High-quality subset of SWE-bench.
Code Completion
Predicting the next tokens in code sequences.
Code Translation
Converting code between programming languages.
Code Summarization
Generating natural language descriptions of code.
Bug Detection
Identifying bugs and vulnerabilities in code.
Program Repair
Automatically fixing bugs in code.