Codesota · Models · Claude 3 OpusAnthropic14 results · 8 benchmarks
Model card

Claude 3 Opus.

Anthropicapi

Most capable Claude 3 model, March 2024. Supports image input. Source: Anthropic Claude 3 family announcement.

§ 01 · Benchmarks

Every benchmark Claude 3 Opus has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01BIG-Bench HardReasoning · Multi-step Reasoningaccuracy86.8%#4/5source ↗
02MMMUMultimodal · Visual Question Answeringaccuracy59.4%#17/182024-03-04source ↗
03GSM8KReasoning · Mathematical Reasoningaccuracy95.0%#20/322024-03-01source ↗
04GPQAReasoning · Multi-step Reasoningaccuracy50.4%#27/33source ↗
05MMLUReasoning · Commonsense Reasoningaccuracy86.8%#29/41source ↗
06HumanEvalComputer Code · Code Generationpass@184.9%#31/42source ↗
07PLCCNatural Language Processing · Polish Cultural Competencyart-and-entertainment73.0%#31/165source ↗
08MATHReasoning · Mathematical Reasoningaccuracy60.1%#34/34source ↗
09PLCCNatural Language Processing · Polish Cultural Competencyhistory86.0%#36/165source ↗
10PLCCNatural Language Processing · Polish Cultural Competencyculture-and-tradition76.0%#46/165source ↗
11PLCCNatural Language Processing · Polish Cultural Competencyaverage73.8%#49/165source ↗
12PLCCNatural Language Processing · Polish Cultural Competencygeography80.0%#58/165source ↗
13PLCCNatural Language Processing · Polish Cultural Competencyvocabulary62.0%#60/165source ↗
14PLCCNatural Language Processing · Polish Cultural Competencygrammar66.0%#61/165source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where Claude 3 Opus actually performs.

Multimodal
1
benchmark
avg rank #17.0
Reasoning
5
benchmarks
avg rank #22.8
Computer Code
1
benchmark
avg rank #31.0
Natural Language Processing
1
benchmark
avg rank #48.7
§ 03 · Papers

1 paper with results for Claude 3 Opus.

  1. 2024-03-04· Multimodal· 1 result

    Claude 3 Model Family (Haiku, Sonnet, Opus)

§ 04 · Related models

Other Anthropic models scored on Codesota.

Claude Opus 4
Undisclosed params · 13 results · 2 SOTA
Claude Opus 4.5
3 results · 2 SOTA
Claude Sonnet 5
Undisclosed params · 2 results · 2 SOTA
Claude Sonnet 4
10 results · 1 SOTA
Claude Mythos Preview
1 result · 1 SOTA
Claude 3.5 Sonnet
Undisclosed params · 27 results
Claude Opus 4.5
Undisclosed params · 13 results
Claude 3.7 Sonnet
10 results
§ 05 · Sources & freshness

Where these numbers come from.

sdadas/PLCC
7
results
openai-simple-evals
4
results
llm-stats-bbh
1
result
arxiv
1
result
gsm8k-shadow-page
1
result
9 of 14 rows marked verified. · first result 2024-03-01, latest 2024-03-04.