Anomaly Detection for ManufacturingFrom MVTec benchmarks to production inspection lines
Manual visual inspection catches 80% of defects on a good day. Trained anomaly detection models hit 99.8%. This guide compares six leading approaches, shows you how to deploy them, and gives you the numbers to build the business case.
TL;DR
- Best overall: EfficientAD (99.8% AUROC, real-time speed) for most production lines
- Best accuracy per sample: PatchCore (99.6% with as few as 10 normal images)
- Zero-shot option: WinCLIP if you have zero training images, but expect a 4-5% accuracy gap
- Need explainability? AnomalyGPT gives natural language defect descriptions but is 10-20x slower
- Deployment: ONNX export to Jetson for edge; FastAPI + ONNX Runtime for centralized
Why Manufacturing Needs Anomaly Detection
Traditional quality control relies on rule-based machine vision (thresholding, template matching) or human inspectors. Both break down as product complexity increases:
- Rule-based vision requires explicit programming for every defect type. A new scratch pattern means a new rule. Natural variation in materials triggers false positives.
- Human inspectors fatigue after 20-30 minutes of sustained attention. Accuracy drops from ~95% to ~75% over a shift. They cannot inspect at line speeds of 1,000+ units/hour.
- Supervised ML (classification/segmentation) works well but requires labeled defect images. In manufacturing, defects are rare by design. You might see 1 defective part per 1,000 -- not enough to train a classifier.
Anomaly detection solves this by learning only from normal images. The model learns what "good" looks like, then flags anything that deviates. No defect labels needed.
The MVTec AD Benchmark
MVTec Anomaly Detection (MVTec AD) is the standard benchmark for unsupervised anomaly detection in industrial inspection. Released in 2019 by MVTec Software GmbH, it contains 5,354 high-resolution images across 15 categories of real-world industrial products and textures.
| Category | Type | Train (Normal) | Test (Normal) | Test (Anomalous) | Defect Types |
|---|---|---|---|---|---|
| Bottle | object | 209 | 20 | 63 | Broken large, broken small, contamination |
| Cable | object | 224 | 58 | 92 | Bent wire, cable swap, cut inner/outer, missing cable, poke |
| Capsule | object | 219 | 23 | 109 | Crack, faulty imprint, poke, scratch, squeeze |
| Carpet | texture | 280 | 28 | 89 | Color, cut, hole, metal contamination, thread |
| Hazelnut | object | 391 | 40 | 70 | Crack, cut, hole, print |
| Leather | texture | 245 | 32 | 92 | Color, cut, fold, glue, poke |
| Metal Nut | object | 220 | 22 | 93 | Bent, color, flip, scratch |
| Pill | object | 267 | 26 | 141 | Color, combined, contamination, crack, faulty imprint, scratch, type |
| Screw | object | 320 | 41 | 119 | Manipulated front, scratch head/neck, thread side/top |
| Tile | texture | 230 | 33 | 84 | Crack, glue strip, gray stroke, oil, rough |
| Toothbrush | object | 60 | 12 | 30 | Defective |
| Transistor | object | 213 | 60 | 40 | Bent lead, cut lead, damaged case, misplaced |
| Wood | texture | 247 | 19 | 60 | Color, combined, hole, liquid, scratch |
| Zipper | object | 240 | 32 | 119 | Broken teeth, combined, fabric border, fabric interior, rough, split teeth, squeezed teeth |
| Grid | texture | 264 | 21 | 57 | Bent, broken, glue, metal contamination, thread |
Image-level AUROC
Binary classification: is this image normal or anomalous? Measured as Area Under the ROC Curve. A score of 99.8% means the model almost perfectly separates good parts from defective ones.
Pixel-level AUROC
Localization: can the model pinpoint where the defect is? Each pixel is scored as normal or anomalous. Critical for operators who need to see exactly what went wrong.
Model Comparison on MVTec AD
All scores are mean AUROC (%) across 15 MVTec AD categories. FPS measured on NVIDIA A100 at 256x256 resolution.
| Model | Year | Image AUROC | Pixel AUROC | FPS | Approach | Training Data |
|---|---|---|---|---|---|---|
| PatchCore | 2022 | 99.6% | 98.1% | ~5-12 | Memory bank + k-NN | Few normal samples |
| EfficientAD | 2024 | 99.8% | 98.8% | ~50-80 | Student-teacher + autoencoder | Normal samples only |
| SimpleNet | 2023 | 99.6% | 98.1% | ~70-85 | Feature adaptor + discriminator | Normal samples only |
| DRAEM | 2021 | 98.0% | 97.3% | ~25-40 | Reconstruction + synthetic anomalies | Normal + synthetic defects |
| AnomalyGPT | 2024 | 96.3% | 95.2% | ~2-5 | LVL model + in-context learning | Zero-shot or few-shot |
| WinCLIP | 2023 | 95.2% | 93.8% | ~15-25 | CLIP + window-based scoring | Zero-shot (text prompts) |
PatchCore
2022Towards Total Recall in Industrial Anomaly Detection
High accuracy with minimal data, simple to deploy
Memory grows linearly with coreset size; slow at scale
EfficientAD
2024EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies
Best accuracy-speed tradeoff; real-time capable
Requires careful hyperparameter tuning per category
SimpleNet
2023SimpleNet: A Simple Network for Image Anomaly Detection and Localization
Extremely fast inference; lightweight architecture
Slightly lower pixel-level localization on textures
DRAEM
2021DRAEM: A Discriminatively Trained Anomaly Detection Model
Generates its own training anomalies; no real defect data needed
Synthetic anomalies may not match real defect distributions
AnomalyGPT
2024AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models
Natural language explanations of defects; zero-shot capable
Slow inference; requires large GPU; lower raw accuracy
WinCLIP
2023WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
No training images needed at all; prompt-based
Accuracy gap vs trained methods; struggles with subtle defects
Zero-Shot vs Few-Shot Approaches
The biggest practical question in manufacturing AD: how many normal images do you need?
Zero-Shot
Models like WinCLIP use text prompts ("a photo of a damaged bottle") and vision-language pretraining. No product-specific training.
Few-Shot (1-16 images)
PatchCore with 2-4 shot achieves ~97% AUROC. AnomalyGPT with in-context examples reaches ~96%. Practical for new product onboarding.
Full Training (200+ images)
Standard unsupervised training with full normal dataset. EfficientAD and PatchCore both exceed 99.5% AUROC with adequate normal samples.
Practical recommendation
Start with zero-shot (WinCLIP) to validate the concept. Collect 10-50 normal images from the line and switch to PatchCore for a quick accuracy boost. Once you have 200+ images (usually 1-2 days of production), train EfficientAD for the final deployment. This staged approach lets you demonstrate value within days while building toward peak accuracy.
Deployment: Edge vs Cloud
Where you run inference matters as much as which model you pick. Latency budgets, data sovereignty, and cost structure all depend on deployment topology.
Edge (NVIDIA Jetson / Hailo)
Best for: High-speed lines, air-gapped facilities
- + No network dependency
- + Lowest latency
- + Data stays on-premise
- + Scales with line count
- - Limited model size
- - Harder to update models
- - Per-unit hardware cost
On-premise GPU Server
Best for: Multi-line facilities, mixed workloads
- + Full model flexibility
- + Centralized management
- + Shared across lines
- + Easy model updates
- - Network latency to cameras
- - Single point of failure
- - Upfront capex
Cloud (AWS/GCP)
Best for: Prototyping, low-volume, multi-site aggregation
- + No hardware investment
- + Auto-scaling
- + Latest models available
- + Central dashboard
- - Network dependency
- - Data privacy concerns
- - Ongoing opex
- - Not viable for high-speed lines
Code: From Training to Production
Complete workflow using anomalib, ONNX Runtime, and FastAPI.
# PatchCore inference with anomalib (v2)
from anomalib.models import Patchcore
from anomalib.engine import Engine
from anomalib.data import MVTec
# 1. Configure model
model = Patchcore(
backbone="wide_resnet50_2",
layers_to_extract=["layer2", "layer3"],
coreset_sampling_ratio=0.1, # 10% of patches → memory bank
num_neighbors=9,
)
# 2. Load MVTec AD dataset (or your custom dataset)
datamodule = MVTec(
root="./datasets/MVTec",
category="bottle",
image_size=(256, 256),
train_batch_size=32,
eval_batch_size=32,
)
# 3. Train (builds memory bank from normal images)
engine = Engine(max_epochs=1) # PatchCore needs only 1 epoch
engine.fit(model=model, datamodule=datamodule)
# 4. Evaluate
results = engine.test(model=model, datamodule=datamodule)
# Returns: image_AUROC, pixel_AUROC, F1, PRO score# Export trained model to ONNX for edge deployment
import torch
from anomalib.deploy import ExportMode
# Export to ONNX (quantized INT8 for Jetson)
engine.export(
model=model,
export_mode=ExportMode.ONNX,
export_root="./exported_models/bottle_patchcore",
)
# ── Inference on edge device ──
import onnxruntime as ort
import numpy as np
from PIL import Image
session = ort.InferenceSession(
"bottle_patchcore/model.onnx",
providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"],
)
def inspect(image_path: str, threshold: float = 0.5):
img = Image.open(image_path).resize((256, 256))
input_array = np.array(img).astype(np.float32) / 255.0
input_array = np.transpose(input_array, (2, 0, 1)) # HWC → CHW
input_array = np.expand_dims(input_array, axis=0)
outputs = session.run(None, {"input": input_array})
anomaly_score = outputs[0][0] # Scalar score
anomaly_map = outputs[1][0] # Pixel-level heatmap
return {
"is_defective": float(anomaly_score) > threshold,
"confidence": float(anomaly_score),
"heatmap": anomaly_map,
}# Production API wrapper for anomaly detection
from fastapi import FastAPI, UploadFile
from contextlib import asynccontextmanager
import onnxruntime as ort
import numpy as np
from PIL import Image
import io, time
models: dict = {}
@asynccontextmanager
async def lifespan(app: FastAPI):
# Load models at startup
models["bottle"] = ort.InferenceSession("models/bottle.onnx")
models["capsule"] = ort.InferenceSession("models/capsule.onnx")
models["metal_nut"] = ort.InferenceSession("models/metal_nut.onnx")
yield
models.clear()
app = FastAPI(title="Manufacturing AD API", lifespan=lifespan)
@app.post("/inspect/{product_type}")
async def inspect(product_type: str, file: UploadFile):
if product_type not in models:
return {"error": f"No model for {product_type}"}
t0 = time.perf_counter()
img = Image.open(io.BytesIO(await file.read())).resize((256, 256))
input_arr = np.expand_dims(
np.transpose(np.array(img, dtype=np.float32) / 255.0, (2, 0, 1)),
axis=0,
)
outputs = models[product_type].run(None, {"input": input_arr})
latency_ms = (time.perf_counter() - t0) * 1000
return {
"product_type": product_type,
"anomaly_score": float(outputs[0][0]),
"is_defective": float(outputs[0][0]) > 0.5,
"latency_ms": round(latency_ms, 1),
}ROI: Building the Business Case
Cost savings come from three sources: reduced manual inspection labor, fewer escaped defects (warranty/recall costs), and higher throughput from automated inspection at line speed.
| Industry | Inspection Rate | Defect Rate | Manual Cost/Unit | AI Cost/Unit | Annual Savings | Payback |
|---|---|---|---|---|---|---|
| PCB Assembly | 1,200 units/hr | 2-5% | $0.08 | $0.002 | $340K | 4 mo |
| Automotive Parts | 300 units/hr | 0.5-1% | $0.25 | $0.01 | $520K | 3 mo |
| Pharmaceutical | 5,000 units/hr | 0.1-0.3% | $0.04 | $0.001 | $890K | 2 mo |
| Textile / Fabric | 50 m/min | 3-8% | $0.15 | $0.005 | $210K | 6 mo |
Key cost drivers
- Hardware: NVIDIA Jetson Orin Nano ($200) handles EfficientAD at 50+ FPS. One unit per camera. Amortized over 3-5 years.
- Integration: Camera mounting, lighting control, and PLC integration typically cost 2-3x the hardware. Budget $2K-5K per inspection station.
- Maintenance: Model retraining when products change. Budget 2-4 hours of ML engineering per product variant per quarter.
- Hidden savings: Automated inspection generates defect data that feeds back into process engineering. Teams report 15-30% reduction in defect generationwithin 6 months of deployment.