TRELLIS.2: Production-Ready 3D Assets in 3 Seconds
Microsoft Research releases a 4B parameter image-to-3D model that generates game-ready PBR assets from single images. The novel O-Voxel representation enables resolutions up to 1536^3 with full transparency and arbitrary topology support.
Key Takeaways
- -4B parameters (2x TRELLIS v1), MIT license for commercial use
- -512^3 resolution in 3 seconds, 1024^3 in 17 seconds on H100
- -Full PBR output: base color, roughness, metallic, opacity/transparency
- -User studies: 4.55/5 satisfaction, 4.82/5 intent alignment
What is TRELLIS.2?
TRELLIS.2 is Microsoft Research's latest image-to-3D generation model, a direct continuation of their CVPR'25 spotlight work. The model takes a single image as input and generates complete 3D assets with physically-based rendering (PBR) materials, suitable for immediate use in game engines and production pipelines.
Unlike previous approaches that struggle with complex topology, TRELLIS.2 handles open surfaces, non-manifold geometry, and internal structures. This makes it the first truly production-ready open-source image-to-3D model for game development and digital asset creation.
Technical Architecture: O-Voxel Representation
The core innovation in TRELLIS.2 is the O-Voxel (Occupancy Voxel) representation, a field-free sparse voxel structure that fundamentally changes how 3D geometry is represented during generation. Traditional approaches use implicit neural fields (NeRF, SDF) which require expensive per-point queries during mesh extraction.
How O-Voxel Works
O-Voxel represents 3D shapes as a sparse set of occupied voxels, each storing:
- Occupancy probability: Whether the voxel contains surface geometry
- Surface normal: Orientation of the local surface
- PBR attributes: Base color, roughness, metallic, and opacity values
This sparse representation enables scaling to 1536^3 resolution, approximately 3.6 billion potential voxels, while only storing and processing the occupied subset. The sparsity pattern itself encodes the shape, eliminating the need for expensive field evaluations.
Sparse Compression VAE
TRELLIS.2 introduces a Sparse Compression Variational Autoencoder (SC-VAE) with 16x spatial downsampling. This compresses the O-Voxel representation into a compact latent space where the diffusion process operates. At 1024^3 resolution, this produces approximately 9,600 tokens, making generation tractable on current hardware.
Rectified Flow Transformer Backbone
The generation backbone uses Rectified Flow Transformers, a recent advancement in diffusion model architectures that provides straighter sampling trajectories. This enables high-quality generation in fewer sampling steps compared to traditional DDPM or DDIM schedulers.
Performance Benchmarks
TRELLIS.2 achieves remarkable generation speeds across different resolution targets:
| Resolution | Generation Time | Token Count | Use Case |
|---|---|---|---|
| 512^3 | 3 seconds | ~1.2K | Real-time preview, prototyping |
| 1024^3 | 17 seconds | ~9.6K | Production assets, game development |
| 1536^3 | 60 seconds | ~32K | High-detail cinematics, VFX |
All benchmarks measured on NVIDIA H100 GPU. Consumer hardware (RTX 4090) achieves approximately 2-3x longer generation times.
User Study Results
Microsoft conducted extensive user studies comparing TRELLIS.2 against competing methods:
- Overall satisfaction: 4.55/5.00
- Intent alignment: 4.82/5.00 (how well the output matches user expectations)
- Geometry quality: 4.61/5.00
- Material quality: 4.43/5.00
Comparison with Competitors
The image-to-3D space has seen rapid development, with several notable models competing for production adoption:
| Model | Parameters | PBR Support | Transparency | License |
|---|---|---|---|---|
| TRELLIS.2 | 4B | Full | Yes | MIT |
| Hunyuan3D-2 | 2B | Full | Limited | Tencent |
| Point-E (OpenAI) | 1B | No | No | MIT |
| Shap-E (OpenAI) | 300M | Basic | No | MIT |
| Wonder3D | ~1B | Partial | No | CC-BY-NC |
Key Differentiators
TRELLIS.2 distinguishes itself from competitors in several critical areas:
- Arbitrary topology: Unlike SDF-based methods, O-Voxel handles non-watertight meshes, open surfaces, and complex internal structures
- Full transparency: First open model to properly generate translucent and transparent materials (glass, liquids, ice)
- Production formats: Direct export to GLB, OBJ, STL, GLTF, USDZ, and PLY without post-processing
- MIT license: Full commercial use permitted, unlike many alternatives with NC restrictions
Output Formats and PBR Pipeline
TRELLIS.2 generates complete PBR-ready assets with the following material channels:
- Base Color (Albedo): RGB diffuse color without lighting information
- Roughness: Surface smoothness from mirror (0.0) to fully rough (1.0)
- Metallic: Metallic/dielectric blend factor
- Opacity: Full alpha channel support for transparency
Supported export formats include GLB (web/games), OBJ (legacy pipelines), STL (3D printing), GLTF (web), USDZ (Apple ecosystem), and PLY (point clouds/research).
Use Cases for Game Developers
TRELLIS.2 addresses several pain points in game asset creation workflows:
Rapid Prototyping
At 3 seconds per asset (512^3), designers can iterate on visual concepts in real-time. Generate dozens of variations from concept art sketches before committing to manual modeling.
Background Asset Generation
For open-world games requiring thousands of environmental props, TRELLIS.2 can generate variety at scale. Generate rocks, debris, vegetation, and architectural details from reference images.
Reference-Based Modeling
Use TRELLIS.2 output as a starting point for manual refinement. Generate a base mesh from concept art, then refine topology and add detail in traditional DCC tools.
Indie Game Development
Small teams without dedicated 3D artists can generate production-quality assets directly from concept images or photographs. The MIT license permits commercial game releases.
Recommendations
When to Use TRELLIS.2
- +Game asset production requiring PBR materials and transparency
- +Rapid prototyping where 3-second generation enables real-time iteration
- +Commercial projects requiring permissive MIT licensing
- +Assets with complex topology: glass objects, foliage, mechanical parts
When to Consider Alternatives
- -Hero assets requiring hand-crafted topology and edge flow
- -Rigged characters needing animation-ready topology
- -Extremely high-poly sculpts (consider ZBrush pipelines)
- -Projects without H100/A100 access (consumer GPUs are slower)
Hardware Requirements
TRELLIS.2 requires significant GPU memory for high-resolution generation:
- 512^3 resolution: 16GB VRAM minimum (RTX 4080, A4000)
- 1024^3 resolution: 40GB VRAM recommended (A100, A6000)
- 1536^3 resolution: 80GB VRAM (H100, A100 80GB)
Cloud deployment on GPU instances is recommended for teams without local hardware. RunPod, Lambda Labs, and major cloud providers offer H100 instances at reasonable rates for batch processing.
Conclusion
TRELLIS.2 represents a significant advancement in AI-driven 3D asset generation. The combination of novel O-Voxel representation, production-ready PBR output, full transparency support, and MIT licensing makes it the current best choice for commercial game development and digital asset creation.
For teams evaluating image-to-3D solutions, TRELLIS.2 should be the primary candidate for production pipelines requiring quality, speed, and legal clarity. The 3-second generation time at 512^3 enables workflows that were previously impractical with slower alternatives.