Is CodeSOTA a Papers with Code replacement?

CodeSOTA builds on the Papers with Code legacy after Meta shut it down in July 2025. We track 286+ benchmark results across 86 datasets with links to code implementations and paper sources.

Can I use CodeSOTA benchmark data?

Yes. All benchmark data is available as JSON at codesota.com/data/benchmarks.json. You can build dashboards, cite it in papers, or integrate it into your tools.

ThePapers With Codesuccessor

State of the Art,
Verified

Q: What is CodeSOTA?

CodeSOTA is an independent ML benchmark tracking platform that provides verified state-of-the-art results across 17 research areas including computer vision, NLP, reasoning, code generation, speech, medical AI, and more. It serves as the Papers with Code successor with fresh, maintained data and links to implementations.

Q: Are CodeSOTA benchmarks verified?

Yes. CodeSOTA runs benchmarks independently where possible, rather than just aggregating paper claims. All data includes source URLs and access dates for verification.

Independent ML benchmarks across 17 research areas. Track progress, find implementations, compare models.

Vision, NLP, reasoning, code, speech, medical, robotics, and more. All results verified with source links.

Explore Benchmarks AI Building Blocks

286+ benchmark results

17 research areas

143 models tracked

Links to implementations

New Explainer

The Ralph Loop: AGI's Dumbest Secret

What if the secret to AGI is while(true)? Interactive visualizations of Wolfram Rule 30, OODA loops, and the $50K→$297 technique.

Explore the loop

while(true)

Explore Research Areas

17 domains. 286+ benchmarks. Find SOTA for your task.

Computer Vision

Detection, segmentation, classification, OCR

10 tasks

NLP

Language models, QA, translation, NER

9 tasks

Reasoning

Mathematical, logical, commonsense

MATH, GSM8K

Code

Generation, SWE-bench, debugging

6 tasks

Speech

ASR, TTS, speaker verification

5 tasks

Medical

Imaging, diagnosis, clinical NLP

4 tasks

Multimodal

Vision-language, VQA, text-to-image

5 tasks

Agentic AI

Autonomous agents, time horizon, HCAST

5 tasks

View all 17 research areas

Trending Models

December 2025 releases analyzed

All news

Code2025-12-30

MiniMax M2.1: 74% SWE-bench at 90% Lower Cost

229B MoE model beats Claude Sonnet 4.5 at $0.30/1M tokens

Reasoning2025-12-29

GLM-4.7: 95.7% AIME 2025

Zhipu AI's 358B model sets math reasoning records

3D2025-12-28

TRELLIS.2: 3D Assets in 3 Seconds

Microsoft's 4B model generates game-ready PBR assets

NLP2025-12-28

Hunyuan MT Beats Google Translate

Tencent's 1.8B model wins WMT2025, runs on phones

Free PDF Download

The Zen of AI Composition

Building intelligent systems from first principles. A philosophical guide to AI transformations, modular composition, and evidence-based prompting.

Download now

CodeSOTA

The Zen of AI Composition

Kacper Wikiel

New Feature

AI Building Blocks

Stop searching. Start building. See which tools transform your data - with production-ready implementations.

Interactive Explainers

3Blue1Brown-style explanations of paradoxes and counterintuitive results. Play with simulations, not just read about them.

In-Depth Comparisons

All guides

Agentic

SWE-bench SOTA

Which AI agents solve real GitHub issues? Latest scores on SWE-bench Verified.

Reasoning

Mathematical Reasoning

MATH, GSM8K, GPQA benchmarks. How models tackle competition-level problems.

Speech

Speech Recognition

LibriSpeech WER scores. Whisper, Conformer, and multilingual ASR compared.

Vision

Document OCR

OmniDocBench, OCRBench results. 50+ models tested on real documents.

Medical

Chest X-ray AI

CheXpert, MIMIC-CXR benchmarks. AUROC scores for radiology models.

Reference

AI Building Blocks

Input-to-output transformations. Find the right architecture for your task.

Can I trust these numbers?

Numbers from published papers, verified with our own tests where possible. No marketing claims, no sponsored rankings.

Which model fits my use case?

Compare accuracy, speed, cost, and deployment complexity. We show you the tradeoffs that matter for production.

Can I use this data?

Yes. All benchmark data available as JSON. Build dashboards, cite it in papers, integrate it into your tools.

286+

Benchmark results

Research areas

Datasets tracked

143

Models compared

Open Data

Use This Data

All benchmark data available as JSON

Build dashboards, cite in papers, integrate into your tools. No API key needed. Updated weekly with new results.

Download JSON View Methodology

Free to use

Source links included

Updated weekly

Frequently Asked Questions

What is CodeSOTA?

CodeSOTA is an independent ML benchmark tracking platform. We provide verified state-of-the-art results across 17 research areas including computer vision, NLP, reasoning, code generation, speech, medical AI, robotics, and more.

Is this a Papers with Code replacement?

CodeSOTA builds on the Papers with Code legacy after Meta shut it down in July 2025. We track 286+ benchmark results with links to implementations. Read the full story.

Are these benchmarks verified?

Yes. We run benchmarks independently where possible, rather than just aggregating paper claims. All data includes source URLs and access dates for verification. See our methodology.

Can I use this benchmark data?

Yes. All benchmark data is available as JSON at /data/benchmarks.json. Build dashboards, cite it in papers, or integrate it into your tools.

What People Say

Anonymous

AI Consultant, Voice-AI at scale

"Outstanding work. Just yesterday I was searching for good OCR comparisons and found only marketing BS. Good job!"

December 2025

Anonymous

Senior Architect

"Super clean, slop-free UI, but most importantly the copy: very precise positioning and project overview."

December 2025

Cite CodeSOTA

If you use CodeSOTA in your research, please cite:

@misc{wikiel2025codesota,
  author = {Wikieł, Kacper},
  title = {CodeSOTA: Independent ML Benchmark Tracking},
  year = {2025},
  url = {https://codesota.com},
  note = {Accessed: 2025}
}

Or in plain text: Wikieł, K. (2025). CodeSOTA: Independent ML Benchmark Tracking. https://codesota.com

Want updates on new benchmarks?

We'll let you know when we add tests for new models or tasks.

No spam. Unsubscribe anytime.

State of the Art,Verified

The Ralph Loop: AGI's Dumbest Secret

Explore Research Areas

Computer Vision

NLP

Reasoning

Code

Speech

Medical

Multimodal

Agentic AI

Trending Models

MiniMax M2.1: 74% SWE-bench at 90% Lower Cost

GLM-4.7: 95.7% AIME 2025

TRELLIS.2: 3D Assets in 3 Seconds

Hunyuan MT Beats Google Translate

The Zen of AI Composition

AI Building Blocks

Interactive Explainers

In-Depth Comparisons

SWE-bench SOTA

Mathematical Reasoning

Speech Recognition

Document OCR

Chest X-ray AI

AI Building Blocks

Can I trust these numbers?

Which model fits my use case?

Can I use this data?

Use This Data

Frequently Asked Questions

What is CodeSOTA?

Is this a Papers with Code replacement?

Are these benchmarks verified?

Can I use this benchmark data?

What People Say

Anonymous

Anonymous

Cite CodeSOTA

Want updates on new benchmarks?

State of the Art,
Verified