HellaSwag
Unknown
70K sentence completion problems testing commonsense natural language inference.
Benchmark Stats
Models4
Papers4
Metrics1
SOTA History
Not enough data to show trend.
Only 4 models on this benchmark
Help build the community leaderboard — submit your model results.
accuracy
accuracy
Higher is better