AI systems now match or exceed radiologist performance in detecting pneumonia, COVID-19, and other thoracic diseases. Track the state of the art in chest X-ray classification.
From raw DICOM images to clinical predictions. Understanding how chest X-ray AI works is essential for deployment.

Raw chest X-rays arrive as DICOM files. Preprocessing includes contrast enhancement, resizing to 224x224, and normalization to zero mean and unit variance.
Most models use DenseNet-121 pretrained on ImageNet. Vision Transformers and CLIP-based Vision-Language models are becoming dominant.
Output is typically 14 binary labels for conditions like Atelectasis, Cardiomegaly, Consolidation, Edema, Pleural Effusion, and Pneumonia.

Grad-CAM (Gradient-weighted Class Activation Mapping) reveals which regions the model focuses on for each pathology. Real chest X-ray from the COVID-19 Image Data Collection.
Stanford's CheXpert is the gold standard for chest X-ray classification. Mean AUC across 5 competition pathologies.
| Rank | Model | Mean AUC | Architecture | Notes |
|---|---|---|---|---|
| #1 | CheXpert AUC Maximizer Stanford | 93.0% | DenseNet-121 Ensemble | Non-API entry from src |
| #2 | BioViL Microsoft | 89.1% | Vision-Language Transformer | Non-API entry from src |
| #3 | CheXzero Harvard/MIT | 88.6% | CLIP-based Vision-Language | Non-API entry from src |
| #4 | GLoRIA Stanford | 88.2% | Vision-Language (Local + Global) | Non-API entry from src |
| #5 | MedCLIP Research | 87.8% | CLIP-based Vision-Language | Non-API entry from src |
| #6 | TorchXRayVision Cohen Lab | 87.4% | DenseNet-121 / ResNet | Non-API entry from src |
| #7 | DenseNet-121 (Chest X-ray) Research | 86.5% | DenseNet-121 | Non-API entry from src |
How do models generalize across different chest X-ray benchmarks?
| Model | CheXpert | NIH ChestX-ray14 | MIMIC-CXR | VinDr-CXR |
|---|---|---|---|---|
| CheXpert AUC Maximizer | 93.0 | - | - | - |
| BioViL | 89.1 | - | - | - |
| CheXzero | 88.6 | - | 89.2 | - |
| GLoRIA | 88.2 | - | - | - |
| MedCLIP | 87.8 | - | - | - |
| TorchXRayVision | 87.4 | 85.8 | 86.3 | 87.9 |
| DenseNet-121 (Chest X-ray) | 86.5 | 82.6 | - | - |
| CheXNet | - | 84.1 | - | - |
Traditional CNNs (CheXNet, DenseNet) dominated until 2022. Now, CLIP-based models like CheXzero and MedCLIP are achieving competitive results with zero-shot transfer.
These models learn from paired image-text data (X-rays + radiology reports), enabling them to classify new conditions without retraining. GLoRIA and BioViL further improve by learning local region-text alignments.
Unlike ImageNet, chest X-ray labels are extracted from radiology reports using NLP, introducing significant noise:
The NIH ChestX-ray14 established the standard set of thoracic diseases that all major benchmarks now use:


Chest X-ray models output probability scores for each of 14 standard pathologies. A threshold (typically 50%) determines positive predictions.
224,316 chest radiographs from 65,240 patients with 14 pathology labels. Includes uncertainty labels and expert radiologist annotations for validation set. The gold standard for chest X-ray classification.
377,110 chest X-ray images from 227,835 studies of 65,379 patients with free-text radiology reports. Largest publicly available chest X-ray dataset with paired image-text data.
112,120 frontal-view chest X-ray images from 30,805 unique patients with 14 disease labels extracted using NLP from radiology reports. Foundational benchmark for chest X-ray AI.
18,000 chest X-ray scans with radiologist annotations for 22 local labels and 6 global labels. Each image annotated by 3 radiologists with bounding box localization.
Have you achieved better results on CheXpert or published a new chest X-ray model? Help the community by sharing your verified results.