Codesota · Models · Qwen2-VL 72BAlibaba8 results · 5 benchmarks
Model card

Qwen2-VL 72B.

Alibabaopen-sourceVision-Language Model1 current SOTA

Qwen2's large vision-language model.

§ 01 · Benchmarks

Every benchmark Qwen2-VL 72B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01VQA v2.0Multimodal · Visual Question Answeringaccuracy87.6%#1/72024-09-18source ↗
02CC-OCRComputer Vision · General OCR Capabilitieskie-f171.8%#1/5source ↗
03TextVQAMultimodal · Visual Question Answeringaccuracy84.9%#2/92024-09-18source ↗
04CC-OCRComputer Vision · General OCR Capabilitiesdocument-parsing53.8%#2/6source ↗
05CC-OCRComputer Vision · General OCR Capabilitiesmulti-scene-f178.0%#2/9source ↗
06MMBenchMultimodal · Visual Question Answeringaccuracy88.0%#3/82024-09-18source ↗
07CC-OCRComputer Vision · General OCR Capabilitiesmultilingual-f171.1%#3/8source ↗
08MMMUMultimodal · Visual Question Answeringaccuracy64.5%#14/182024-09-18source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where Qwen2-VL 72B actually performs.

Computer Vision
1
benchmark
avg rank #2.0 · 1 SOTA
Multimodal
4
benchmarks
avg rank #5.0
§ 03 · Papers

1 paper with results for Qwen2-VL 72B.

  1. 2024-09-18· Multimodal· 4 results

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

§ 04 · Related models

Other Alibaba models scored on Codesota.

Qwen2.5-72B-Instruct
72B params · 4 results
Qwen2.5-Coder 32B
32B params · 4 results
GOT-OCR2.0
3 results
Qwen 3 72B
72B params · 2 results
Qwen2.5-VL 32B
2 results
Qwen2.5-VL 72B
72B params · 2 results
Qwen 3 14B
14B params · 1 result
Qwen2-VL 7B
7B params · 1 result
§ 05 · Sources & freshness

Where these numbers come from.

arxiv
4
results
alphaxiv-leaderboard
2
results
cc-ocr-paper
2
results
6 of 8 rows marked verified.