GPT-4V.

UnknownmultimodalUnknown paramsTransformer

GPT-4 with Vision. First major multimodal GPT-4 release, Sept 2023. Evaluated on MMMU, VQA, TextVQA. Source: GPT-4 Technical Report.

§ 01 · Benchmarks

Every benchmark GPT-4V has a recorded score for.

#	Benchmark	Area · Task	Metric	Value	Rank	Date	Source
01	MMBench	Multimodal · Visual Question Answering	accuracy	75.8%	#6/8	2023-03-15	source ↗
02	TextVQA	Multimodal · Visual Question Answering	accuracy	78.0%	#6/9	2023-03-15	source ↗
03	VQA v2.0	Multimodal · Visual Question Answering	accuracy	77.2%	#7/7	2023-03-15	source ↗
04	MMMU	Multimodal · Visual Question Answering	accuracy	56.8%	#18/18	2023-03-15	source ↗

Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.

§ 02 · Strengths by area