Depth Estimation
Predict depth from a single image. Critical for 3D reconstruction, AR/VR, and robotics.
How Depth Estimation Works
A technical deep-dive into depth estimation. From monocular depth prediction to 3D point cloud generation.
Real Examples
See how depth estimation transforms different scene types. The depth map uses a turbo colormap: red = close, blue = far.
Depth Estimation Types
Three approaches: monocular (single image), stereo (two cameras), and multi-view (many images).
Monocular Depth
Single image input
Stereo Depth
Two images (left/right)
Multi-view Depth
Multiple images
Relative vs Metric Depth
Ordinal relationships: "A is closer than B"
Actual distances: "A is 2.5 meters away"
Model Evolution
From self-supervised learning to foundation models.
How Monocular Depth Works
Neural networks learn depth cues from massive datasets.
Learned Depth Cues
Typical Architecture
Train with ground truth depth from LiDAR, RGBD sensors, or synthetic data.
Learn from stereo pairs or video sequences using view synthesis as supervision.
Pre-train on massive diverse data for zero-shot transfer to any domain.
Output Formats
Different representations for different use cases.
Depth Map Visualization
Depth to Point Cloud
With camera intrinsics, back-project each pixel to 3D:
Applications
Where depth estimation is used in practice.
Code Examples
Get started with depth estimation in Python.
from transformers import pipeline
from PIL import Image
import numpy as np
# Load Depth Anything v2 model
pipe = pipeline(
task='depth-estimation',
model='depth-anything/Depth-Anything-V2-Large-hf'
)
# Run inference
image = Image.open('image.jpg')
result = pipe(image)
# Get depth map
depth = result['depth'] # PIL Image
depth_array = np.array(depth) # H x W array
# Normalize for visualization
depth_normalized = (depth_array - depth_array.min()) / \
(depth_array.max() - depth_array.min())
print(f'Depth shape: {depth_array.shape}')
print(f'Depth range: {depth_array.min():.2f} - {depth_array.max():.2f}')Quick Reference
- - Depth Anything v2
- - MiDaS 3.1
- - ZoeDepth
- - Depth Pro
- - UniDepth
- - Marigold
- - Depth Pro
Use Cases
- ✓3D scene reconstruction
- ✓AR/VR applications
- ✓Robot navigation
- ✓Computational photography
Architectural Patterns
Monocular Depth Estimation
Predict depth from a single image using learned priors.
- +Works with any camera
- +No calibration needed
- -Scale ambiguity
- -Relative depth only
Metric Depth Estimation
Predict absolute depth in real-world units.
- +Real-world scale
- +Directly usable
- -Needs training data with GT
- -Domain-specific
Stereo Depth
Use stereo image pairs for triangulation.
- +Accurate
- +Physically grounded
- -Needs stereo camera
- -Calibration required
Implementations
Open Source
Depth Anything V2
Apache 2.0State-of-the-art monocular depth. Very robust.
Benchmarks
Quick Facts
- Input
- Image
- Output
- Depth Map
- Implementations
- 4 open source, 0 API
- Patterns
- 3 approaches