Model card
Kimi K2.5.
Moonshot AIapi
§ 01 · Benchmarks
Every benchmark Kimi K2.5 has a recorded score for.
| # | Benchmark | Area · Task | Metric | Value | Rank | Date | Source |
|---|---|---|---|---|---|---|---|
| 01 | React Native Evals | Mobile Development · React Native Code Generation | navigation-satisfaction | 93.3% | #5 | — | source ↗ |
| 02 | React Native Evals | Mobile Development · React Native Code Generation | async-state-satisfaction | 77.7% | #7 | — | source ↗ |
| 03 | React Native Evals | Mobile Development · React Native Code Generation | requirement-satisfaction | 74.9% | #7 | — | source ↗ |
| 04 | React Native Evals | Mobile Development · React Native Code Generation | animation-satisfaction | 59.4% | #8 | — | source ↗ |
| 05 | MMLU | Reasoning · Commonsense Reasoning | accuracy | 86.0% | #34 | 2025-12-01 | source ↗ |
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area
Where Kimi K2.5 actually performs.
§ 04 · Related models
Other Moonshot AI models scored on Codesota.
§ 05 · Sources & freshness
Where these numbers come from.
Callstack Incubator
4
results
codesota-shadow-mmlu
1
result
4 of 5 rows marked verified.