SVAMP

Unknown

1,000 elementary-level math word problems testing robustness of arithmetic reasoning.

Benchmark Stats

Models3
Papers3
Metrics1

SOTA History

Not enough data to show trend.

Only 3 models on this benchmark

Help build the community leaderboard — submit your model results.

accuracy

accuracy

Higher is better

RankModelSourceScoreYearPaper
1gpt-4o

Simple Variations on Arithmetic Math word Problems.

Editorial93.72025Source
2claude-35-sonnetEditorial91.22025Source
3llama-3-70bEditorial89.52025Source

Submit a Result

SVAMP Leaderboard | CodeSOTA | CodeSOTA