Benchmarks for clinical AI.
Evaluating AI performance in healthcare — diagnosis, segmentation, clinical NLP, drug discovery.
Medical benchmarks are where the score translates directly into patient outcome. We track the ones that are public, reproducible, and hold up under re-scoring.
Live and coming.
ABIDE (Autism)→
The standard for autism classification from brain imaging (fMRI). Compare MCBERT, DeepASD, and other SOTA models.
Chest X-Ray AI→
CheXpert, MIMIC-CXR, and NIH ChestX-ray14. Compare CheXNet, CheXzero, and vision-language models.
Medical Segmentation
Benchmarks for organ and tumor segmentation — BraTS for brain tumors, LiTS for liver, and the MSD suite.
Clinical NLP
Clinical notes, radiology reports, and medical Q&A — MedQA, PubMedQA, and de-identified MIMIC notes.
Know a benchmark we’re missing?
Medical ML is scattered across MICCAI, specialty journals, and Kaggle. If there’s a public score we should be tracking, submit it — we verify and append within 48h.