MiniMax M2.1: The New SWE-bench Leader at 90% Lower Cost
A 229B parameter Mixture-of-Experts model achieves 74.0% on SWE-bench Verified while running only 10B active parameters per token. MiniMax has delivered what may be the most cost-efficient frontier coding model to date.
On December 23, 2025, MiniMax released M2.1, a Mixture-of-Experts language model that immediately claimed the top position on SWE-bench Verified. The model achieves 74.0% on the benchmark, surpassing DeepSeek V3.2 (73.1%), Kimi K2 (71.3%), and Claude Sonnet 4.5 (68.2%). What makes this result significant is not just the raw score, but the efficiency: MiniMax M2.1 processes tokens using only 10B active parameters from its 229B total, enabling a 90% cost reduction compared to Claude.
MiniMax positions M2.1 as a "Digital Employee" optimized for agentic coding and tool use. The model includes Interleaved Thinking capability for long-horizon planning, making it well-suited for complex software engineering tasks that require sustained reasoning across multiple files and repositories.
Technical Specifications
| Architecture | Mixture-of-Experts (MoE) |
| Total Parameters | 229 billion |
| Active Parameters | 10 billion per token |
| Release Date | December 23, 2025 |
| Special Capabilities | Interleaved Thinking for long-horizon planning |
| Tensor Formats | FP8, BF16, F32 |
| Deployment Frameworks | SGLang, vLLM, Transformers |
| API Cost | $0.30 per 1M tokens |
Benchmark Results: MiniMax M2.1 vs Claude Sonnet 4.5
| Benchmark | MiniMax M2.1 | Claude Sonnet 4.5 | Delta |
|---|---|---|---|
SWE-bench Verified Real GitHub issue resolution | 74.0% | 68.2% | +5.8% |
VIBE (Full-stack) Full-stack application development | 88.6% | 86.1% | +2.5% |
Multi-SWE-Bench Multi-repository engineering tasks | 49.4% | 45.7% | +3.7% |
MiniMax M2.1 outperforms Claude Sonnet 4.5 across all three agentic coding benchmarks while costing 90% less per token.
Cost Analysis
API Pricing Comparison
Why MoE Enables Low Cost
The Mixture-of-Experts architecture activates only a subset of parameters per token. With 229B total parameters but only 10B active, MiniMax M2.1 achieves frontier performance while consuming a fraction of the compute. This architectural efficiency directly translates to lower API costs.
Deployment Example
MiniMax M2.1 supports deployment via SGLang, vLLM, and Transformers. Below is an example using the Transformers library with FP8 quantization for efficient inference:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load MiniMax M2.1 with FP8 quantization
model_name = "minimax/MiniMax-M2.1-229B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float8_e4m3fn, # FP8 for efficiency
device_map="auto",
trust_remote_code=True,
)
# Example: Code generation prompt
prompt = """You are a software engineer. Fix the following bug:
File: utils/parser.py
Issue: JSON parsing fails on nested arrays with null values.
Provide the corrected code."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate with Interleaved Thinking enabled
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.1,
do_sample=True,
use_interleaved_thinking=True, # Long-horizon planning
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)For production deployments, SGLang or vLLM are recommended for higher throughput. MiniMax also provides an official API endpoint.
Competitive Landscape
MiniMax M2.1 enters a crowded field of frontier coding models. Here is how it compares to other leading options on SWE-bench Verified:
| Model | SWE-bench | Parameters | Cost | License |
|---|---|---|---|---|
SOTAMiniMax M2.1 | 74.0% | 229B (10B active) | $0.30/1M | Permissive |
DeepSeek V3.2 | 73.1% | 671B MoE | $0.27/1M | MIT |
Kimi K2 | 71.3% | 1T MoE | $0.60/1M | Commercial |
Claude Sonnet 4.5 | 68.2% | Unknown | $3.00/1M | Commercial |
GPT-4o | 33.2% | Unknown | $2.50/1M | Commercial |
Key differentiator: MiniMax M2.1's combination of top benchmark performance and low cost makes it the clear choice for cost-sensitive agentic applications. DeepSeek V3.2 offers similar pricing but slightly lower performance. Claude Sonnet 4.5 remains competitive for users who prioritize Anthropic's safety research and enterprise support.
Recommendations
When to Use MiniMax M2.1
- -High-volume agentic coding tasks where cost is critical
- -Multi-file repository navigation and bug fixing
- -Full-stack application development workflows
- -Self-hosted deployments with SGLang or vLLM
- -Multilingual software development projects
When to Consider Alternatives
- -Enterprise environments requiring vendor support (Claude)
- -Mathematical reasoning tasks (consider GLM-4.7)
- -Strict safety/alignment requirements (Claude)
- -OpenAI ecosystem integrations (GPT-4o/o1)
Conclusion
MiniMax M2.1 represents a significant milestone in the democratization of frontier AI capabilities. By achieving the top position on SWE-bench Verified at 90% lower cost than Claude, it demonstrates that the Mixture-of-Experts architecture can deliver exceptional performance without proportional compute costs.
For teams building agentic coding systems, automated code review pipelines, or developer productivity tools, MiniMax M2.1 offers the best cost-performance ratio currently available. The support for SGLang, vLLM, and Transformers provides deployment flexibility, while the Interleaved Thinking capability enables complex multi-step reasoning tasks.
As the competitive landscape for coding models intensifies, we expect continued rapid improvement. Track the latest SWE-bench results and model comparisons on CodeSOTA.