Industrial VisionPractical Guide2026

Anomaly Detection for ManufacturingFrom MVTec benchmarks to production inspection lines

Manual visual inspection catches 80% of defects on a good day. Trained anomaly detection models hit 99.8%. This guide compares six leading approaches, shows you how to deploy them, and gives you the numbers to build the business case.

March 2026|22 min read|6 models compared

TL;DR

Best overall: EfficientAD (99.8% AUROC, real-time speed) for most production lines
Best accuracy per sample: PatchCore (99.6% with as few as 10 normal images)
Zero-shot option: WinCLIP if you have zero training images, but expect a 4-5% accuracy gap
Need explainability? AnomalyGPT gives natural language defect descriptions but is 10-20x slower
Deployment: ONNX export to Jetson for edge; FastAPI + ONNX Runtime for centralized

Why Manufacturing Needs Anomaly Detection

~$3.1T

Annual cost of poor quality in manufacturing (ASQ estimate)

80%

Typical human inspector accuracy under sustained workload

99.8%

Best-in-class ML model accuracy on MVTec AD benchmark

Traditional quality control relies on rule-based machine vision (thresholding, template matching) or human inspectors. Both break down as product complexity increases:

Rule-based vision requires explicit programming for every defect type. A new scratch pattern means a new rule. Natural variation in materials triggers false positives.
Human inspectors fatigue after 20-30 minutes of sustained attention. Accuracy drops from ~95% to ~75% over a shift. They cannot inspect at line speeds of 1,000+ units/hour.
Supervised ML (classification/segmentation) works well but requires labeled defect images. In manufacturing, defects are rare by design. You might see 1 defective part per 1,000 -- not enough to train a classifier.

Anomaly detection solves this by learning only from normal images. The model learns what "good" looks like, then flags anything that deviates. No defect labels needed.

The MVTec AD Benchmark

MVTec Anomaly Detection (MVTec AD) is the standard benchmark for unsupervised anomaly detection in industrial inspection. Released in 2019 by MVTec Software GmbH, it contains 5,354 high-resolution images across 15 categories of real-world industrial products and textures.

Category	Type	Train (Normal)	Test (Normal)	Test (Anomalous)	Defect Types
Bottle	object	209	20	63	Broken large, broken small, contamination
Cable	object	224	58	92	Bent wire, cable swap, cut inner/outer, missing cable, poke
Capsule	object	219	23	109	Crack, faulty imprint, poke, scratch, squeeze
Carpet	texture	280	28	89	Color, cut, hole, metal contamination, thread
Hazelnut	object	391	40	70	Crack, cut, hole, print
Leather	texture	245	32	92	Color, cut, fold, glue, poke
Metal Nut	object	220	22	93	Bent, color, flip, scratch
Pill	object	267	26	141	Color, combined, contamination, crack, faulty imprint, scratch, type
Screw	object	320	41	119	Manipulated front, scratch head/neck, thread side/top
Tile	texture	230	33	84	Crack, glue strip, gray stroke, oil, rough
Toothbrush	object	60	12	30	Defective
Transistor	object	213	60	40	Bent lead, cut lead, damaged case, misplaced
Wood	texture	247	19	60	Color, combined, hole, liquid, scratch
Zipper	object	240	32	119	Broken teeth, combined, fabric border, fabric interior, rough, split teeth, squeezed teeth
Grid	texture	264	21	57	Bent, broken, glue, metal contamination, thread

Image-level AUROC

Binary classification: is this image normal or anomalous? Measured as Area Under the ROC Curve. A score of 99.8% means the model almost perfectly separates good parts from defective ones.

Pixel-level AUROC

Localization: can the model pinpoint where the defect is? Each pixel is scored as normal or anomalous. Critical for operators who need to see exactly what went wrong.

Model Comparison on MVTec AD

All scores are mean AUROC (%) across 15 MVTec AD categories. FPS measured on NVIDIA A100 at 256x256 resolution.

Model	Year	Image AUROC	Pixel AUROC	FPS	Approach	Training Data
PatchCore	2022	99.6%	98.1%	~5-12	Memory bank + k-NN	Few normal samples
EfficientAD	2024	99.8%	98.8%	~50-80	Student-teacher + autoencoder	Normal samples only
SimpleNet	2023	99.6%	98.1%	~70-85	Feature adaptor + discriminator	Normal samples only
DRAEM	2021	98.0%	97.3%	~25-40	Reconstruction + synthetic anomalies	Normal + synthetic defects
AnomalyGPT	2024	96.3%	95.2%	~2-5	LVL model + in-context learning	Zero-shot or few-shot
WinCLIP	2023	95.2%	93.8%	~15-25	CLIP + window-based scoring	Zero-shot (text prompts)

PatchCore

2022

Towards Total Recall in Industrial Anomaly Detection

Image AUROC

99.6%

Pixel AUROC

98.1%

Speed

~5-12 FPS

Strength

High accuracy with minimal data, simple to deploy

Weakness

Memory grows linearly with coreset size; slow at scale

EfficientAD

2024

EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies

Image AUROC

99.8%

Pixel AUROC

98.8%

Speed

~50-80 FPS

Strength

Best accuracy-speed tradeoff; real-time capable

Weakness

Requires careful hyperparameter tuning per category

SimpleNet

2023

SimpleNet: A Simple Network for Image Anomaly Detection and Localization

Image AUROC

99.6%

Pixel AUROC

98.1%

Speed

~70-85 FPS

Strength

Extremely fast inference; lightweight architecture

Weakness

Slightly lower pixel-level localization on textures

DRAEM

2021

DRAEM: A Discriminatively Trained Anomaly Detection Model

Image AUROC

98%

Pixel AUROC

97.3%

Speed

~25-40 FPS

Strength

Generates its own training anomalies; no real defect data needed

Weakness

Synthetic anomalies may not match real defect distributions

AnomalyGPT

2024

AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Image AUROC

96.3%

Pixel AUROC

95.2%

Speed

~2-5 FPS

Strength

Natural language explanations of defects; zero-shot capable

Weakness

Slow inference; requires large GPU; lower raw accuracy

WinCLIP

2023

WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation

Image AUROC

95.2%

Pixel AUROC

93.8%

Speed

~15-25 FPS

Strength

No training images needed at all; prompt-based

Weakness

Accuracy gap vs trained methods; struggles with subtle defects

Zero-Shot vs Few-Shot Approaches

The biggest practical question in manufacturing AD: how many normal images do you need?

Zero-Shot

0 images

Models like WinCLIP use text prompts ("a photo of a damaged bottle") and vision-language pretraining. No product-specific training.

Typical accuracy:

93-95% Image AUROC

Few-Shot (1-16 images)

1-16 images

PatchCore with 2-4 shot achieves ~97% AUROC. AnomalyGPT with in-context examples reaches ~96%. Practical for new product onboarding.

Typical accuracy:

96-98% Image AUROC

Full Training (200+ images)

200+ images

Standard unsupervised training with full normal dataset. EfficientAD and PatchCore both exceed 99.5% AUROC with adequate normal samples.

Typical accuracy:

99-99.8% Image AUROC

Practical recommendation

Start with zero-shot (WinCLIP) to validate the concept. Collect 10-50 normal images from the line and switch to PatchCore for a quick accuracy boost. Once you have 200+ images (usually 1-2 days of production), train EfficientAD for the final deployment. This staged approach lets you demonstrate value within days while building toward peak accuracy.

Deployment: Edge vs Cloud

Where you run inference matters as much as which model you pick. Latency budgets, data sovereignty, and cost structure all depend on deployment topology.

Edge (NVIDIA Jetson / Hailo)

Latency: <20ms

Cost: $200-800 per unit

Best for: High-speed lines, air-gapped facilities

Pros

+ No network dependency
+ Lowest latency
+ Data stays on-premise
+ Scales with line count

Cons

- Limited model size
- Harder to update models
- Per-unit hardware cost

On-premise GPU Server

Latency: 20-50ms

Cost: $5K-15K one-time

Best for: Multi-line facilities, mixed workloads

Pros

+ Full model flexibility
+ Centralized management
+ Shared across lines
+ Easy model updates

Cons

- Network latency to cameras
- Single point of failure
- Upfront capex

Cloud (AWS/GCP)

Latency: 50-200ms

Cost: $0.50-2/hr GPU

Best for: Prototyping, low-volume, multi-site aggregation

Pros

+ No hardware investment
+ Auto-scaling
+ Latest models available
+ Central dashboard

Cons

- Network dependency
- Data privacy concerns
- Ongoing opex
- Not viable for high-speed lines

Code: From Training to Production

Complete workflow using anomalib, ONNX Runtime, and FastAPI.

patchcore_train.pyTrain PatchCore on MVTec AD

# PatchCore inference with anomalib (v2)
from anomalib.models import Patchcore
from anomalib.engine import Engine
from anomalib.data import MVTec

# 1. Configure model
model = Patchcore(
    backbone="wide_resnet50_2",
    layers_to_extract=["layer2", "layer3"],
    coreset_sampling_ratio=0.1,  # 10% of patches → memory bank
    num_neighbors=9,
)

# 2. Load MVTec AD dataset (or your custom dataset)
datamodule = MVTec(
    root="./datasets/MVTec",
    category="bottle",
    image_size=(256, 256),
    train_batch_size=32,
    eval_batch_size=32,
)

# 3. Train (builds memory bank from normal images)
engine = Engine(max_epochs=1)  # PatchCore needs only 1 epoch
engine.fit(model=model, datamodule=datamodule)

# 4. Evaluate
results = engine.test(model=model, datamodule=datamodule)
# Returns: image_AUROC, pixel_AUROC, F1, PRO score

export_and_inference.pyExport to ONNX and run on edge

# Export trained model to ONNX for edge deployment
import torch
from anomalib.deploy import ExportMode

# Export to ONNX (quantized INT8 for Jetson)
engine.export(
    model=model,
    export_mode=ExportMode.ONNX,
    export_root="./exported_models/bottle_patchcore",
)

# ── Inference on edge device ──
import onnxruntime as ort
import numpy as np
from PIL import Image

session = ort.InferenceSession(
    "bottle_patchcore/model.onnx",
    providers=["TensorrtExecutionProvider", "CUDAExecutionProvider"],
)

def inspect(image_path: str, threshold: float = 0.5):
    img = Image.open(image_path).resize((256, 256))
    input_array = np.array(img).astype(np.float32) / 255.0
    input_array = np.transpose(input_array, (2, 0, 1))  # HWC → CHW
    input_array = np.expand_dims(input_array, axis=0)

    outputs = session.run(None, {"input": input_array})
    anomaly_score = outputs[0][0]    # Scalar score
    anomaly_map = outputs[1][0]      # Pixel-level heatmap

    return {
        "is_defective": float(anomaly_score) > threshold,
        "confidence": float(anomaly_score),
        "heatmap": anomaly_map,
    }

api_server.pyProduction FastAPI wrapper

# Production API wrapper for anomaly detection
from fastapi import FastAPI, UploadFile
from contextlib import asynccontextmanager
import onnxruntime as ort
import numpy as np
from PIL import Image
import io, time

models: dict = {}

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Load models at startup
    models["bottle"] = ort.InferenceSession("models/bottle.onnx")
    models["capsule"] = ort.InferenceSession("models/capsule.onnx")
    models["metal_nut"] = ort.InferenceSession("models/metal_nut.onnx")
    yield
    models.clear()

app = FastAPI(title="Manufacturing AD API", lifespan=lifespan)

@app.post("/inspect/{product_type}")
async def inspect(product_type: str, file: UploadFile):
    if product_type not in models:
        return {"error": f"No model for {product_type}"}

    t0 = time.perf_counter()
    img = Image.open(io.BytesIO(await file.read())).resize((256, 256))
    input_arr = np.expand_dims(
        np.transpose(np.array(img, dtype=np.float32) / 255.0, (2, 0, 1)),
        axis=0,
    )

    outputs = models[product_type].run(None, {"input": input_arr})
    latency_ms = (time.perf_counter() - t0) * 1000

    return {
        "product_type": product_type,
        "anomaly_score": float(outputs[0][0]),
        "is_defective": float(outputs[0][0]) > 0.5,
        "latency_ms": round(latency_ms, 1),
    }

ROI: Building the Business Case

Cost savings come from three sources: reduced manual inspection labor, fewer escaped defects (warranty/recall costs), and higher throughput from automated inspection at line speed.

Industry	Inspection Rate	Defect Rate	Manual Cost/Unit	AI Cost/Unit	Annual Savings	Payback
PCB Assembly	1,200 units/hr	2-5%	$0.08	$0.002	$340K	4 mo
Automotive Parts	300 units/hr	0.5-1%	$0.25	$0.01	$520K	3 mo
Pharmaceutical	5,000 units/hr	0.1-0.3%	$0.04	$0.001	$890K	2 mo
Textile / Fabric	50 m/min	3-8%	$0.15	$0.005	$210K	6 mo

Key cost drivers

Hardware: NVIDIA Jetson Orin Nano ($200) handles EfficientAD at 50+ FPS. One unit per camera. Amortized over 3-5 years.
Integration: Camera mounting, lighting control, and PLC integration typically cost 2-3x the hardware. Budget $2K-5K per inspection station.
Maintenance: Model retraining when products change. Budget 2-4 hours of ML engineering per product variant per quarter.
Hidden savings: Automated inspection generates defect data that feeds back into process engineering. Teams report 15-30% reduction in defect generationwithin 6 months of deployment.

Which Model Should You Use?

Scenario

High-speed production line, stable products

Use

EfficientAD

Why

Best accuracy at real-time speed. Train once per product, deploy to edge.

Scenario

Frequent product changeovers, small batches

Use

PatchCore (few-shot)

Why

Reaches 97%+ AUROC with just 4-10 images. Fastest time to deployment.

Scenario

New facility, no training data yet

Use

WinCLIP (zero-shot) then migrate

Why

Start inspecting immediately with text prompts. Collect data for PatchCore/EfficientAD.

Scenario

Regulated industry needing defect explanations

Use

AnomalyGPT

Why

Generates natural language reports. Slower but provides auditable defect descriptions.

Scenario

Resource-constrained edge devices

Use

SimpleNet

Why

Lightest model with strong accuracy. Runs on low-power edge hardware.

Scenario

Complex defect patterns, textured surfaces

Use

DRAEM

Why

Synthetic anomaly generation handles texture categories well.

Anomaly Detection for ManufacturingFrom MVTec benchmarks to production inspection lines

TL;DR

Why Manufacturing Needs Anomaly Detection

The MVTec AD Benchmark

Image-level AUROC

Pixel-level AUROC

Model Comparison on MVTec AD

PatchCore

EfficientAD

SimpleNet

DRAEM

AnomalyGPT

WinCLIP

Zero-Shot vs Few-Shot Approaches

Zero-Shot

Few-Shot (1-16 images)

Full Training (200+ images)

Practical recommendation

Deployment: Edge vs Cloud

Edge (NVIDIA Jetson / Hailo)

On-premise GPU Server

Cloud (AWS/GCP)

Code: From Training to Production

ROI: Building the Business Case

Key cost drivers

Which Model Should You Use?

Related Resources

Industrial Inspection Benchmarks

All Guides

Computer Vision Hub