SVAMP
Unknown
1,000 elementary-level math word problems testing robustness of arithmetic reasoning.
Benchmark Stats
Models3
Papers3
Metrics1
SOTA History
Not enough data to show trend.
Only 3 models on this benchmark
Help build the community leaderboard — submit your model results.
accuracy
accuracy
Higher is better