Embedded AI
starting with Hailo
The independent catalog for edge AI on Hailo accelerators. Chip lineup, Model Zoo benchmarks, and pre-compiled HEF files for YOLOv11, YOLO26, PaddleOCR, CLIP, and on-device Llama 3.
Our goal: be the Hugging Face for Hailo — one page per model, one click to a verified HEF for your chip.
What’s here
Why Hailo for edge
Hailo builds NPUs that integrate all model memory directly on die — no external DRAM lookups during inference. That design hits very high perf-per-watt, which is why Hailo-8 ships inside the official Raspberry Pi 5 AI Kit and why Hailo-10H can run a 3B LLM on 2.5W.
Integrated memory
Weights and activations stay on-die. Predictable latency, no DRAM thrash, low power. The trade-off: models must fit or be partitioned by the Hailo compiler.
Fixed-point at deploy
Every model is quantized (INT8, or INT4 for LLMs on 10H) and compiled into an HEF binary. You train in PyTorch / ONNX, then run the Dataflow Compiler to produce the HEF.
Linux-first stack
HailoRT runs on x86, ARM, and Raspberry Pi. GStreamer, Python, C++ bindings. No CUDA, no drivers hell — a kernel module and a userspace runtime.
Hailo chip lineup
From 7 TOPS vision SoCs to 40 TOPS generative-AI accelerators. Pick the chip, then pick the model below.
| Chip | Family | Performance | Power | Form factor | Best for | Status |
|---|---|---|---|---|---|---|
| Hailo-8L | Accelerator | 13 TOPS (INT8) | ~1.5 W typical | M.2 / PCIe | Cost-sensitive edge: single-stream detection, smart home, POS | Shipping |
| Hailo-8 | Accelerator | 26 TOPS (INT8) | ~2.5 W typical | M.2 / PCIe / SoM | Multi-stream CV: smart cameras, retail analytics, Raspberry Pi 5 AI kit | Shipping |
| Hailo-10H | Accelerator | 40 TOPS (INT4) | ~2.5 W typical | M.2 | On-device LLMs/VLMs, Llama 3 8B at 10+ tok/s, generative edge AI | New |
| Hailo-15H | Vision Processor | 20 TOPS (INT8) | ~3-5 W | SoC (VPU) | High-end smart cameras with on-chip ISP + NN core | Shipping |
| Hailo-15L | Vision Processor | 7 TOPS (INT8) | ~2 W | SoC (VPU) | Mass-market IP cameras replacing traditional SoCs | Mass market |
Hailo-10H launched July 2025 as the first edge accelerator with on-chip generative-AI capability. Hailo-8 remains the workhorse for multi-stream computer vision.
Pre-compiled model catalog
FPS numbers are from the public Hailo Model Zoo benchmark tables — INT8, batch 1, on reference boards. LLM rows are decode throughput at INT4 on Hailo-10H with 2K context.
Detection
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| YOLOv11n | 640×640 | 2.6M | 135 | 210 | 260 | 240 | Latest YOLO nano, NMS on-chip |
| YOLOv11s | 640×640 | 9.4M | 72 | 140 | 175 | 160 | Balanced accuracy/speed |
| YOLOv11m | 640×640 | 20.1M | 38 | 70 | 95 | 85 | Higher mAP for demanding scenes |
| YOLOv8n | 640×640 | 3.2M | 150 | 235 | 275 | 255 | Most deployed edge detector |
| YOLOv8s | 640×640 | 11.2M | 78 | 150 | 180 | 165 |
Detection / Seg
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| YOLO26n | 640×640 | 3.0M | — | — | 250 | 230 | NMS-free, newest family |
Oriented BBox
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| YOLOv11n-obb | 640×640 | 2.7M | — | — | 210 | 195 | Rotated boxes for aerial/industrial |
Instance Seg
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| YOLOv8n-seg | 640×640 | 3.4M | 85 | 155 | 190 | 175 | |
| YOLOv5n-seg-hpp | 640×640 | 2.0M | 120 | 195 | 230 | 215 | HailoRT-accelerated post-process |
Pose
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| YOLOv8n-pose | 640×640 | 3.3M | 88 | 160 | 195 | 180 | 17-keypoint human pose |
Classification
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| ResNet-50 | 224×224 | 25.6M | 720 | 1,390 | 1,750 | 1,500 | ImageNet reference |
| MobileNet V3 | 224×224 | 5.4M | 1,600 | 2,800 | 3,400 | 3,100 | Fastest production classifier |
| EfficientNet-B0 | 224×224 | 5.3M | 1,020 | 1,850 | 2,300 | 2,050 |
OCR
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| PaddleOCR-v5 (det+rec) | Multi | ~12M | 22 | 45 | 65 | 58 | Latest PP-OCR pipeline |
Face Detection
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| RetinaFace MobileNet | 736×1280 | 0.4M | 85 | 140 | 165 | 150 |
Face Recognition
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| ArcFace R50 | 112×112 | 43.6M | 380 | 720 | 890 | 800 |
Monocular Depth
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| FastDepth | 224×224 | 3.9M | 380 | 640 | 790 | 710 |
Embeddings
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| CLIP ViT-L/14 (Laion2B) | 224×224 | 304M | — | — | 42 | 28 | Image embeddings for retrieval |
LLM (INT4)
| Model | Input | Params | Hailo-8L | Hailo-8 | Hailo-10H | Hailo-15H | Notes |
|---|---|---|---|---|---|---|---|
| Llama 3.2 3B | — | 3.2B | — | — | 28 | — | tok/s decode, 2K ctx |
| Llama 3.1 8B | — | 8.0B | — | — | 11 | — | tok/s decode, 2K ctx |
| Qwen 2.5 1.5B | — | 1.5B | — | — | 45 | — | tok/s decode |
Numbers under LLM columns are tokens/sec (decode). Everything else is frames/sec. A dash means the model isn’t officially compiled for that chip — usually because it exceeds the SRAM budget or the chip targets a different task class.
What is an HEF?
HEF (Hailo Executable Format) is the compiled binary that actually runs on a Hailo chip. You can’t load a PyTorch or ONNX model directly — the Hailo Dataflow Compiler converts it, quantizes the weights, maps ops onto the NPU’s cores and memory, and produces a single .hef file.
Compile pipeline
- Train or download model in ONNX / TF / PyTorch
- Run
hailo parser→ Hailo Archive (HAR) - Run
hailo optimizewith a calibration set → quantized HAR - Run
hailo compile→ HEF - Load with HailoRT:
.hef→ ConfiguredNetworkGroup → inference
Why pre-compiled HEFs matter
- Compile step takes minutes to hours and needs a license
- Quantization results depend on calibration data — bad calibration, bad accuracy
- Each chip has a different HEF (Hailo-8 HEF doesn’t run on 10H)
- Most deployments want “give me YOLOv11n for Hailo-8, verified” — not a compile pipeline
Roadmap
This page is the MVP of a bigger plan — become the Hugging Face of Hailo-compiled models.
Hailo chip + Model Zoo catalog
Chip spec table, per-task FPS benchmarks, links to the official Hailo Model Zoo source.
Per-model pages with HEF downloads
One page per model × chip combo. Verified HEF binary, SHA256, calibration notes, latency numbers, example inference code.
Compile-on-demand
Upload an ONNX model, pick a Hailo chip, get back a compiled HEF. Behind the scenes: our inference server runs the Hailo Dataflow Compiler.
Other edge NPUs
Same catalog pattern for Google Coral (Edge TPU), NVIDIA Jetson, Rockchip RK3588 NPU, and Qualcomm QCS.
Resources
Hailo Model Zoo (GitHub)
Source of truth for officially supported models and HEF binaries.
Hailo product pages
Official spec sheets for Hailo-8, 8L, 10H, and Hailo-15 VPUs.
Hailo Community
Forum where Hailo engineers answer compile and runtime questions.
CodeSOTA Hardware
Our datacenter / consumer GPU catalog — the counterpart to this edge-AI section.
Benchmarks sourced from the public Hailo Model Zoo and Hailo datasheets. Page maintained by CodeSOTA. Last updated April 2026.