WinoGrande
Unknown
44K Winograd-style problems requiring commonsense reasoning to resolve pronoun references.
Benchmark Stats
Models3
Papers3
Metrics1
SOTA History
Coming SoonVisual timeline of state-of-the-art progression over time will appear here.
accuracy
accuracy
Higher is better
| Rank | Model | Code | Score | Paper / Source |
|---|---|---|---|---|
| 1 | gpt-4o Pronoun resolution requiring commonsense reasoning. | - | 87.5 | openai-blog |
| 2 | claude-35-sonnet | - | 85.4 | anthropic-blog |
| 3 | llama-3-70b | HF | 85.3 | meta-blog |