New section · April 2026

Embedded AI
starting with Hailo

The independent catalog for edge AI on Hailo accelerators. Chip lineup, Model Zoo benchmarks, and pre-compiled HEF files for YOLOv11, YOLO26, PaddleOCR, CLIP, and on-device Llama 3.

Our goal: be the Hugging Face for Hailo — one page per model, one click to a verified HEF for your chip.

Chip lineup Model catalog What is HEF?

What’s here

Hailo chips covered

Benchmarked models

Detection / seg variants

On-device LLMs (Hailo-10H)

Why Hailo for edge

Hailo builds NPUs that integrate all model memory directly on die — no external DRAM lookups during inference. That design hits very high perf-per-watt, which is why Hailo-8 ships inside the official Raspberry Pi 5 AI Kit and why Hailo-10H can run a 3B LLM on 2.5W.

Integrated memory

Weights and activations stay on-die. Predictable latency, no DRAM thrash, low power. The trade-off: models must fit or be partitioned by the Hailo compiler.

Fixed-point at deploy

Every model is quantized (INT8, or INT4 for LLMs on 10H) and compiled into an HEF binary. You train in PyTorch / ONNX, then run the Dataflow Compiler to produce the HEF.

Linux-first stack

HailoRT runs on x86, ARM, and Raspberry Pi. GStreamer, Python, C++ bindings. No CUDA, no drivers hell — a kernel module and a userspace runtime.

Hailo chip lineup

From 7 TOPS vision SoCs to 40 TOPS generative-AI accelerators. Pick the chip, then pick the model below.

Chip	Family	Performance	Power	Form factor	Best for	Status
Hailo-8L	Accelerator	13 TOPS (INT8)	~1.5 W typical	M.2 / PCIe	Cost-sensitive edge: single-stream detection, smart home, POS	Shipping
Hailo-8	Accelerator	26 TOPS (INT8)	~2.5 W typical	M.2 / PCIe / SoM	Multi-stream CV: smart cameras, retail analytics, Raspberry Pi 5 AI kit	Shipping
Hailo-10H	Accelerator	40 TOPS (INT4)	~2.5 W typical	M.2	On-device LLMs/VLMs, Llama 3 8B at 10+ tok/s, generative edge AI	New
Hailo-15H	Vision Processor	20 TOPS (INT8)	~3-5 W	SoC (VPU)	High-end smart cameras with on-chip ISP + NN core	Shipping
Hailo-15L	Vision Processor	7 TOPS (INT8)	~2 W	SoC (VPU)	Mass-market IP cameras replacing traditional SoCs	Mass market

Hailo-10H launched July 2025 as the first edge accelerator with on-chip generative-AI capability. Hailo-8 remains the workhorse for multi-stream computer vision.

→ Every LLM that runs on Hailo-10H (full benchmarks)

Pre-compiled model catalog

FPS numbers are from the public Hailo Model Zoo benchmark tables — INT8, batch 1, on reference boards. LLM rows are decode throughput at INT4 on Hailo-10H with 2K context.

Detection

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv11n	640×640	2.6M	135	210	260	240	Latest YOLO nano, NMS on-chip
YOLOv11s	640×640	9.4M	72	140	175	160	Balanced accuracy/speed
YOLOv11m	640×640	20.1M	38	70	95	85	Higher mAP for demanding scenes
YOLOv8n	640×640	3.2M	150	235	275	255	Most deployed edge detector
YOLOv8s	640×640	11.2M	78	150	180	165

Detection / Seg

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLO26n	640×640	3.0M	—	—	250	230	NMS-free, newest family

Oriented BBox

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv11n-obb	640×640	2.7M	—	—	210	195	Rotated boxes for aerial/industrial

Instance Seg

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv8n-seg	640×640	3.4M	85	155	190	175
YOLOv5n-seg-hpp	640×640	2.0M	120	195	230	215	HailoRT-accelerated post-process

Pose

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
YOLOv8n-pose	640×640	3.3M	88	160	195	180	17-keypoint human pose

Classification

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
ResNet-50	224×224	25.6M	720	1,390	1,750	1,500	ImageNet reference
MobileNet V3	224×224	5.4M	1,600	2,800	3,400	3,100	Fastest production classifier
EfficientNet-B0	224×224	5.3M	1,020	1,850	2,300	2,050

OCR

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
PaddleOCR-v5 (det+rec)	Multi	~12M	22	45	65	58	Latest PP-OCR pipeline

Face Detection

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
RetinaFace MobileNet	736×1280	0.4M	85	140	165	150

Face Recognition

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
ArcFace R50	112×112	43.6M	380	720	890	800

Monocular Depth

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
FastDepth	224×224	3.9M	380	640	790	710

Embeddings

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
CLIP ViT-L/14 (Laion2B)	224×224	304M	—	—	42	28	Image embeddings for retrieval

LLM (INT4)

Model	Input	Params	Hailo-8L	Hailo-8	Hailo-10H	Hailo-15H	Notes
Llama 3.2 3B	—	3.2B	—	—	28	—	tok/s decode, 2K ctx
Llama 3.1 8B	—	8.0B	—	—	11	—	tok/s decode, 2K ctx
Qwen 2.5 1.5B	—	1.5B	—	—	45	—	tok/s decode

Numbers under LLM columns are tokens/sec (decode). Everything else is frames/sec. A dash means the model isn’t officially compiled for that chip — usually because it exceeds the SRAM budget or the chip targets a different task class.

What is an HEF?

HEF (Hailo Executable Format) is the compiled binary that actually runs on a Hailo chip. You can’t load a PyTorch or ONNX model directly — the Hailo Dataflow Compiler converts it, quantizes the weights, maps ops onto the NPU’s cores and memory, and produces a single .hef file.

Compile pipeline

Train or download model in ONNX / TF / PyTorch
Run hailo parser → Hailo Archive (HAR)
Run hailo optimize with a calibration set → quantized HAR
Run hailo compile → HEF
Load with HailoRT: .hef → ConfiguredNetworkGroup → inference

Why pre-compiled HEFs matter

Compile step takes minutes to hours and needs a license
Quantization results depend on calibration data — bad calibration, bad accuracy
Each chip has a different HEF (Hailo-8 HEF doesn’t run on 10H)
Most deployments want “give me YOLOv11n for Hailo-8, verified” — not a compile pipeline

Roadmap

This page is the MVP of a bigger plan — become the Hugging Face of Hailo-compiled models.

Now

Hailo chip + Model Zoo catalog

Chip spec table, per-task FPS benchmarks, links to the official Hailo Model Zoo source.

Per-model pages with HEF downloads

One page per model × chip combo. Verified HEF binary, SHA256, calibration notes, latency numbers, example inference code.

Later

Compile-on-demand

Upload an ONNX model, pick a Hailo chip, get back a compiled HEF. Behind the scenes: our inference server runs the Hailo Dataflow Compiler.

Later

Other edge NPUs

Same catalog pattern for Google Coral (Edge TPU), NVIDIA Jetson, Rockchip RK3588 NPU, and Qualcomm QCS.

Embedded AI
starting with Hailo

What’s here

Why Hailo for edge

Integrated memory

Fixed-point at deploy

Linux-first stack

Hailo chip lineup

Pre-compiled model catalog

Detection

Detection / Seg

Oriented BBox

Instance Seg

Pose

Classification

OCR

Face Detection

Face Recognition

Monocular Depth

Embeddings

LLM (INT4)

What is an HEF?

Compile pipeline

Why pre-compiled HEFs matter

Roadmap

Hailo chip + Model Zoo catalog

Per-model pages with HEF downloads

Compile-on-demand

Other edge NPUs

Resources

Hailo Model Zoo (GitHub)

Hailo product pages

Hailo Community

CodeSOTA Hardware

Embedded AIstarting with Hailo

What’s here

Why Hailo for edge

Integrated memory

Fixed-point at deploy

Linux-first stack

Hailo chip lineup

Pre-compiled model catalog

Detection

Detection / Seg

Oriented BBox

Instance Seg

Pose

Classification

OCR

Face Detection

Face Recognition

Monocular Depth

Embeddings

LLM (INT4)

What is an HEF?

Compile pipeline

Why pre-compiled HEFs matter

Roadmap

Hailo chip + Model Zoo catalog

Per-model pages with HEF downloads

Compile-on-demand

Other edge NPUs

Resources

Hailo Model Zoo (GitHub)

Hailo product pages

Hailo Community

CodeSOTA Hardware

Embedded AI
starting with Hailo