Codesota · Models · Llama 3.1 405BMeta13 results · 12 benchmarks
Model card

Llama 3.1 405B.

Metaopen-source

Meta Llama 3.1, 405B parameter instruct variant. Released July 2024.

§ 01 · Benchmarks

Every benchmark Llama 3.1 405B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01HellaSwagReasoning · Commonsense Reasoningaccuracy89.0%#3/5source ↗
02CNN/DailyMailNatural Language Processing · Text Summarizationrouge-145.1%#4/62024-07-31source ↗
03CNN/DailyMailNatural Language Processing · Text Summarizationrouge-l42.3%#4/62024-07-31source ↗
04CoNLL-2003Natural Language Processing · Named Entity Recognitionf190.6%#4/72024-07-31source ↗
05SNLINatural Language Processing · Natural Language Inferenceaccuracy91.2%#5/82024-07-31source ↗
06BIG-Bench HardReasoning · Multi-step Reasoningaccuracy85.9%#5/5source ↗
07SuperGLUENatural Language Processing · Text Classificationaverage-score86.7%#6/72024-07-31source ↗
08ARC-ChallengeReasoning · Commonsense Reasoningaccuracy96.9%#6/10source ↗
09SQuAD v2.0Natural Language Processing · Question Answeringf188.7%#12/222024-07-31source ↗
10HumanEvalComputer Code · Code Generationpass@189.0%#20/42source ↗
11MMLUReasoning · Commonsense Reasoningaccuracy88.6%#21/41source ↗
12GPQAReasoning · Multi-step Reasoningaccuracy50.7%#26/33source ↗
13MATHReasoning · Mathematical Reasoningaccuracy73.8%#28/34source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where Llama 3.1 405B actually performs.

Natural Language Processing
5
benchmarks
avg rank #5.8
Reasoning
6
benchmarks
avg rank #14.8
Computer Code
1
benchmark
avg rank #20.0
§ 03 · Papers

1 paper with results for Llama 3.1 405B.

  1. 2024-07-31· Natural Language Processing· 6 results

    The Llama 3 Herd of Models

§ 04 · Related models

Other Meta models scored on Codesota.

DeiT-B Distilled
86M params · 2 results · 1 SOTA
Llama 3 70B
8 results
Llama-4-Maverick
400B total / 17B active (128 experts) params · 6 results
Llama 3.1 70B
4 results
Code Llama 34B
Unknown params · 2 results
ConvNeXt V2 Huge
650M params · 2 results
CodeLlama 70B
70B params · 1 result
ConvNeXt V2 Base
89M params · 1 result
§ 05 · Sources & freshness

Where these numbers come from.

arxiv
6
results
openai-simple-evals
4
results
meta-modelcard
2
results
llm-stats-bbh
1
result
9 of 13 rows marked verified.