Computer Visionmask-generation

Mask Generation

Mask generation produces pixel-precise segmentation masks for objects, and Meta's Segment Anything (SAM, 2023) transformed it from a specialized task into a foundational capability. Trained on 11M images with 1B+ masks, SAM demonstrated that a single promptable model — click a point, draw a box, or provide text — could segment virtually anything. SAM 2 (2024) extended this to video with real-time tracking, while EfficientSAM and FastSAM address the original's computational cost. The "foundation model" moment for segmentation, analogous to what GPT-3 was for NLP.

Datasets

Results

iou

Canonical metric

Canonical Benchmark