Chest X-Ray AI: SOTA Disease Classification

The Chest X-Ray AI Pipeline

From raw DICOM images to clinical predictions. Understanding how chest X-ray AI works is essential for deployment.

Chest X-Ray Preprocessing Pipeline: Original DICOM to Contrast Enhanced to Resized 224x224 to Normalized

Step 1: Preprocessing

DICOM to Normalized Input

Raw chest X-rays arrive as DICOM files. Preprocessing includes contrast enhancement, resizing to 224x224, and normalization to zero mean and unit variance.

Step 2: Feature Extraction

DenseNet / ViT Backbone

Most models use DenseNet-121 pretrained on ImageNet. Vision Transformers and CLIP-based Vision-Language models are becoming dominant.

Step 3: Multi-label Output

14+ Pathology Detection

Output is typically 14 binary labels for conditions like Atelectasis, Cardiomegaly, Consolidation, Edema, Pleural Effusion, and Pneumonia.

Model Explainability with Grad-CAM

Grad-CAM visualization showing model attention for Pneumonia and Cardiomegaly detection on real chest X-ray

Grad-CAM (Gradient-weighted Class Activation Mapping) reveals which regions the model focuses on for each pathology. Real chest X-ray from the COVID-19 Image Data Collection.

CheXpert Leaderboard

Stanford's CheXpert is the gold standard for chest X-ray classification. Mean AUC across 5 competition pathologies.

Rank	Model	Mean AUC	Architecture	Notes
#1	CheXpert AUC Maximizer Stanford	93.0%	DenseNet-121 Ensemble	Non-API entry from src
#2	BioViL Microsoft	89.1%	Vision-Language Transformer	Non-API entry from src
#3	CheXzero Harvard/MIT	88.6%	CLIP-based Vision-Language	Non-API entry from src
#4	GLoRIA Stanford	88.2%	Vision-Language (Local + Global)	Non-API entry from src
#5	MedCLIP Research	87.8%	CLIP-based Vision-Language	Non-API entry from src
#6	TorchXRayVision Cohen Lab	87.4%	DenseNet-121 / ResNet	Non-API entry from src
#7	DenseNet-121 (Chest X-ray) Research	86.5%	DenseNet-121	Non-API entry from src

Cross-Dataset Performance

How do models generalize across different chest X-ray benchmarks?

Model	CheXpert	NIH ChestX-ray14	MIMIC-CXR	VinDr-CXR
CheXpert AUC Maximizer	93.0	-	-	-
BioViL	89.1	-	-	-
CheXzero	88.6	-	89.2	-
GLoRIA	88.2	-	-	-
MedCLIP	87.8	-	-	-
TorchXRayVision	87.4	85.8	86.3	87.9
DenseNet-121 (Chest X-ray)	86.5	82.6	-	-
CheXNet	-	84.1	-	-

The Rise of Vision-Language Models

Traditional CNNs (CheXNet, DenseNet) dominated until 2022. Now, CLIP-based models like CheXzero and MedCLIP are achieving competitive results with zero-shot transfer.

These models learn from paired image-text data (X-rays + radiology reports), enabling them to classify new conditions without retraining. GLoRIA and BioViL further improve by learning local region-text alignments.

The Label Noise Problem

Unlike ImageNet, chest X-ray labels are extracted from radiology reports using NLP, introducing significant noise:

Uncertainty Labels: CheXpert includes "uncertain" labels that models must learn to handle (U-Ones, U-Zeros, U-Ignore strategies).
Multi-site Variability: Different hospitals use different imaging protocols and labeling conventions.
Negative Transfer: Models trained on one dataset may perform worse on another due to domain shift.

The 14 Standard Pathologies

The NIH ChestX-ray14 established the standard set of thoracic diseases that all major benchmarks now use:

Atelectasis

Cardiomegaly

Consolidation

Edema

Effusion

Emphysema

Fibrosis

Hernia

Infiltration

Mass

Nodule

Pleural Thickening

Pneumonia

Pneumothorax

Dataset Scale Comparison

Major Chest X-Ray Datasets by Size: MIMIC-CXR 377K, CheXpert 224K, PadChest 161K, NIH CXR-14 112K

Multi-Label Classification Output

14 pathology classification output with probabilities

Understanding the Output

Chest X-ray models output probability scores for each of 14 standard pathologies. A threshold (typically 50%) determines positive predictions.

High confidence (>70%) - Likely finding
Medium (40-70%) - Uncertain, needs review
Low (<40%) - Unlikely finding

The Datasets

CheXpert

2019

224,316 chest radiographs from 65,240 patients with 14 pathology labels. Includes uncertainty labels and expert radiologist annotations for validation set. The gold standard for chest X-ray classification.

Images

224,316

Primary Metric

auroc

Read Paper Download Data

MIMIC-CXR

2019

377,110 chest X-ray images from 227,835 studies of 65,379 patients with free-text radiology reports. Largest publicly available chest X-ray dataset with paired image-text data.

Images

377,110

Primary Metric

auroc

Read Paper Download Data

NIH ChestX-ray14

2017

112,120 frontal-view chest X-ray images from 30,805 unique patients with 14 disease labels extracted using NLP from radiology reports. Foundational benchmark for chest X-ray AI.

Images

112,120

Primary Metric

auroc

Read Paper Download Data

VinDr-CXR

2022

18,000 chest X-ray scans with radiologist annotations for 22 local labels and 6 global labels. Each image annotated by 3 radiologists with bounding box localization.

Images

18,000

Primary Metric

auroc

Read Paper Download Data

Contribute to Radiology AI

Have you achieved better results on CheXpert or published a new chest X-ray model? Help the community by sharing your verified results.

Submit Benchmark

Reading the
Chest X-Ray

Benchmark Stats

The Chest X-Ray AI Pipeline

DICOM to Normalized Input

DenseNet / ViT Backbone

14+ Pathology Detection

Model Explainability with Grad-CAM

CheXpert Leaderboard

Cross-Dataset Performance

The Rise of Vision-Language Models

The Label Noise Problem

The 14 Standard Pathologies

Dataset Scale Comparison

Multi-Label Classification Output

Understanding the Output

The Datasets

CheXpert

MIMIC-CXR

NIH ChestX-ray14

VinDr-CXR

Contribute to Radiology AI

Reading the Chest X-Ray

Benchmark Stats

The Chest X-Ray AI Pipeline

DICOM to Normalized Input

DenseNet / ViT Backbone

14+ Pathology Detection

Model Explainability with Grad-CAM

CheXpert Leaderboard

Cross-Dataset Performance

The Rise of Vision-Language Models

The Label Noise Problem

The 14 Standard Pathologies

Dataset Scale Comparison

Multi-Label Classification Output

Understanding the Output

The Datasets

CheXpert

MIMIC-CXR

NIH ChestX-ray14

VinDr-CXR

Contribute to Radiology AI

Reading the
Chest X-Ray