Computer Visionmask-generation

Mask Generation

Mask generation produces pixel-precise segmentation masks for objects, and Meta's Segment Anything (SAM, 2023) transformed it from a specialized task into a foundational capability. Trained on 11M images with 1B+ masks, SAM demonstrated that a single promptable model — click a point, draw a box, or provide text — could segment virtually anything. SAM 2 (2024) extended this to video with real-time tracking, while EfficientSAM and FastSAM address the original's computational cost. The "foundation model" moment for segmentation, analogous to what GPT-3 was for NLP.

1
Datasets
4
Results
iou
Canonical metric
Canonical Benchmark

SA-1B

Segment Anything benchmark with 1B+ masks across 11M images

Primary metric: iou
View full leaderboard

Top 10

Leading models on SA-1B.

RankModelmiouYearSource
1
SAM 2 (Hiera-L)
62.22024paper
2
SAM (ViT-H)
58.12023paper
3
FastSAM
57.12023paper
4
EfficientSAM
55.52023paper

All datasets

1 dataset tracked for this task.

Related tasks

Other tasks in Computer Vision.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace