Benchmarks for clinical AI.

Evaluating AI performance in healthcare — diagnosis, segmentation, clinical NLP, drug discovery.

Medical benchmarks are where the score translates directly into patient outcome. We track the ones that are public, reproducible, and hold up under re-scoring.

§ 01 · Featured benchmarks

Live and coming.

Disease classificationLive

ABIDE (Autism)→

The standard for autism classification from brain imaging (fMRI). Compare MCBERT, DeepASD, and other SOTA models.

2 datasets · 20+ results · fMRI & MRI

RadiologyLive

Chest X-Ray AI→

CheXpert, MIMIC-CXR, and NIH ChestX-ray14. Compare CheXNet, CheXzero, and vision-language models.

7 datasets · 20+ results · 900K+ images

Image segmentationSoon

Medical Segmentation

Benchmarks for organ and tumor segmentation — BraTS for brain tumors, LiTS for liver, and the MSD suite.

In queue

Text processingSoon

Clinical NLP

Clinical notes, radiology reports, and medical Q&A — MedQA, PubMedQA, and de-identified MIMIC notes.

In queue

Know a benchmark we’re missing?

Medical ML is scattered across MICCAI, specialty journals, and Kaggle. If there’s a public score we should be tracking, submit it — we verify and append within 48h.

Submit a result →Browse the register