Mask Generation
Mask generation produces pixel-precise segmentation masks for objects, and Meta's Segment Anything (SAM, 2023) transformed it from a specialized task into a foundational capability. Trained on 11M images with 1B+ masks, SAM demonstrated that a single promptable model — click a point, draw a box, or provide text — could segment virtually anything. SAM 2 (2024) extended this to video with real-time tracking, while EfficientSAM and FastSAM address the original's computational cost. The "foundation model" moment for segmentation, analogous to what GPT-3 was for NLP.
SA-1B
Segment Anything benchmark with 1B+ masks across 11M images
Top 10
Leading models on SA-1B.
All datasets
1 dataset tracked for this task.
Related tasks
Other tasks in Computer Vision.
Looking to run a model? HuggingFace hosts inference for this task type.