Qwen2-VL 72B.

Alibabaopen-sourceVision-Language Model1 current SOTA

Qwen2's large vision-language model.

§ 01 · Benchmarks

Every benchmark Qwen2-VL 72B has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	VQA v2.0	Multimodal · Visual Question Answering	accuracy	87.6%	#1/7	2024-09-18	source ↗
02	CC-OCR	Computer Vision · General OCR Capabilities	kie-f1	71.8%	#1/5	—	source ↗
03	TextVQA	Multimodal · Visual Question Answering	accuracy	84.9%	#2/9	2024-09-18	source ↗
04	CC-OCR	Computer Vision · General OCR Capabilities	document-parsing	53.8%	#2/6	—	source ↗
05	CC-OCR	Computer Vision · General OCR Capabilities	multi-scene-f1	78.0%	#2/9	—	source ↗
06	MMBench	Multimodal · Visual Question Answering	accuracy	88.0%	#3/8	2024-09-18	source ↗
07	CC-OCR	Computer Vision · General OCR Capabilities	multilingual-f1	71.1%	#3/8	—	source ↗
08	MMMU	Multimodal · Visual Question Answering	accuracy	64.5%	#14/18	2024-09-18	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 02 · Strengths by area

Where Qwen2-VL 72B actually performs.

Computer Vision

benchmark

avg rank #2.0 · 1 SOTA

Multimodal

benchmarks

avg rank #5.0

§ 03 · Papers

1 paper with results for Qwen2-VL 72B.

2024-09-18· Multimodal· 4 results
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

§ 04 · Related models

Other Alibaba models scored on Codesota.

Qwen2.5-72B-Instruct

72B params · 4 results

Qwen2.5-Coder 32B

32B params · 4 results

GOT-OCR2.0

3 results

Qwen 3 72B

72B params · 2 results

Qwen2.5-VL 32B

2 results

Qwen2.5-VL 72B

72B params · 2 results

Qwen 3 14B

14B params · 1 result

Qwen2-VL 7B

7B params · 1 result

§ 05 · Sources & freshness

Where these numbers come from.

arxiv

results

alphaxiv-leaderboard

results

cc-ocr-paper

results

6 of 8 rows marked verified.

Qwen2-VL 72B.

Every benchmark Qwen2-VL 72B has a recorded score for.

Where Qwen2-VL 72B actually performs.

1 paper with results for Qwen2-VL 72B.

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Other Alibaba models scored on Codesota.

Where these numbers come from.