Codesota · Tasks · Video classificationHome/Tasks/Computer Vision/Video classification

Computer Vision· video-classification

Video classification.

The task of classifying videos into predefined categories or classes. Video classification involves analyzing temporal sequences of frames to understand the content and assign appropriate labels to entire video clips.

6

Datasets

13

Results

top-1-accuracy

Canonical metric

§ 02 · Canonical benchmark

The reference dataset.

Kinetics-400

Human action recognition across 400 action classes

Primary metric: top-1-accuracy

View full leaderboard →

§ 03 · Top 10

Leading models.

Leading models on Kinetics-400.

#	Model	accuracy	Year	Source
★	DINOv3 (7B)	88.2	2025	paper ↗
2	VideoMAE ViT-H ↑320	87.4	2022	paper ↗
3	V-JEPA 2 ViT-g (1B, 384px)	87.3	2025	paper ↗
4	VideoPrism-g	87.2	2024	paper ↗
5	DINOv2 (ViT-g/14)	78.4	2023	paper ↗

What were you looking for on Video classification?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

6 datasets tracked for this task.

5 results · top-1-accuracy

Top: DINOv3 (7B) — 88.2

Something-Something V2

5 results · top-1-accuracy

Top: V-JEPA 2 ViT-g (1B, 384px) — 77.3

3 results · top-1-accuracy

Top: VideoMAE ViT-B — 96.1

Epic-Kitchens-100 (EK100)

§ 05 · Related tasks

Other tasks in Computer Vision.

3D Understanding Depth estimation Document Image Classification Document Layout Analysis Document Parsing Document Understanding General OCR Capabilities Handwriting Recognition

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Video classification? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.