GLM-5: from Vibe Coding to Agentic Engineering

arXiv:2602.15763Submitted Feb 17, 20268 benchmark results

Authors pending

Tasks

Results

8 results reproduced from this paper.

MetricSort byDirectionSorted instantly in-page

Results

SOTA rows

Models

Datasets

#	Model	Vendor	Benchmark	Value	SOTA	Date	Source
01	GLM-5	Zhipu AI	Tau2-Bench	89.7%	#1	—	source ↗
02	GLM-5.1	—	GPQA Diamond	86.2%	—	—	source ↗
03	GLM-5	Zhipu AI	GPQA Diamond	86.0%	—	—	source ↗
04	GLM-5	Zhipu AI	SWE-Bench Verified	77.8%	—	—	source ↗
05	GLM-5.1	—	BrowseComp	68.0%	—	—	source ↗
06	GLM-5	Zhipu AI	BrowseComp	62.0%	—	—	source ↗
07	GLM-5.1	—	HLE	31.0%	—	—	source ↗
08	GLM-5	Zhipu AI	HLE	30.5%	—	—	source ↗