Home / OCR / Benchmarks / hellaswag

hellaswag

Unknown

OCR benchmark

4
Total Results
4
Models Tested
1
Metrics
2025-12-19
Last Updated

accuracy

Higher is better

Rank Model Score Source
1 gpt-4o

Commonsense NLI. Models now exceed human performance (95.6%).

95.3 openai-blog
2 gemini-15-pro 92.5 google-blog
3 claude-35-sonnet 89 anthropic-blog
4 llama-3-70b 88 meta-blog