InternVL2-76B.

Shanghai AI Labopen-source76B paramsVision-Language ModelMIT

§ 01 · Benchmarks

Every benchmark InternVL2-76B has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	VQA v2.0	Multimodal · Visual Question Answering	accuracy	87.2%	#2/7	2024-04-25	source ↗
02	TextVQA	Multimodal · Visual Question Answering	accuracy	84.4%	#3/9	2024-04-25	source ↗
03	CC-OCR	Computer Vision · General OCR Capabilities	multi-scene-f1	76.9%	#3/9	—	source ↗
04	MMBench	Multimodal · Visual Question Answering	accuracy	86.5%	#4/8	2024-04-25	source ↗
05	CC-OCR	Computer Vision · General OCR Capabilities	kie-f1	61.6%	#5/5	—	source ↗
06	CC-OCR	Computer Vision · General OCR Capabilities	document-parsing	35.3%	#6/6	—	source ↗
07	CC-OCR	Computer Vision · General OCR Capabilities	multilingual-f1	46.6%	#6/8	—	source ↗
08	MMMU	Multimodal · Visual Question Answering	accuracy	67.4%	#13/18	2024-04-25	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 02 · Strengths by area

Where InternVL2-76B actually performs.

§ 03 · Papers

1 paper with results for InternVL2-76B.

2024-04-25· Multimodal· 4 results
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

§ 04 · Related models

Other Shanghai AI Lab models scored on Codesota.

InternImage-H

2 results · 1 SOTA

InternImage-H

Unknown params · 1 result

InternVL3-76B

1 result

InternVL3-78B

78B params · 1 result

§ 05 · Sources & freshness

Where these numbers come from.

arxiv

results

cc-ocr-paper

results

alphaxiv-leaderboard

result

7 of 8 rows marked verified.