Recent Papers / arXiv:pwc-82965

Kimi K2.6

arXiv:pwc-82965Submitted Apr 20, 20266 benchmark results

Authors pending

Tasks

Results

6 results reproduced from this paper.

MetricSort byDirectionSorted instantly in-page

Results

5

SOTA rows

1

Models

1

Datasets

0

#	Model	Vendor	Benchmark	Value	SOTA	Date	Source
01	Kimi K2.6	Moonshot AI	GPQA Diamond	90.5%	—	—	source ↗
02	Kimi K2.6	Moonshot AI	BrowseComp	83.2%	—	—	source ↗
03	Kimi K2.6	Moonshot AI	SWE-Bench Verified	80.2%	—	—	source ↗
04	Kimi K2.6	Moonshot AI	MMMU-Pro	79.4%	—	—	source ↗
05	Kimi K2.6	Moonshot AI	HLE	54.0%	#1	—	source ↗

CodeSOTA extraction

Benchmark evidence

Link this paper to benchmark rows, datasets, model cards, and reproduced results as evidence is extracted.

§ 02 · Models

1 model from this paper.

Add or update benchmark results

Logged-in editor · benchmark trail

Read next

Three places to go from here.

All tracked papers in the registry, with benchmark result, model, and leaderboard linkage where available.

Papers with Code is dead — alternatives

What replaced PWC for each use case: LLMs, OCR, speech, vision, robotics.

Every frontier LLM benchmark, scored.