Type a prompt, get an image. The most consumer-visible AI category and the most fragmented vendor market — buyers Google “best AI image generator” relentlessly and the honest answer is “depends on the job.”
Below: 14 providers compared on cost per image, max resolution, text-in-image rendering, style control, editing, commercial-safe training, and license.
Frontier API · hyperscaler cloud · open weights. Per-image cost normalised to a standard 1024×1024 render.
| Provider / Model | Tier | License | Cost / image | Max res | Text-in-img | Edit | Comm-safe | Speed | |
|---|---|---|---|---|---|---|---|---|---|
| Frontier | Proprietary API | $10–120/mo | 2048×2048 (upscale to ~4K) | Decent | ✓ | — | ~30–60 s | Claim → | |
| Frontier | Proprietary API | $0.04–0.17 | 1024×1024 / 1792×1024 | Strong | ✓ | — | ~10–25 s | Claim → | |
| Frontier | Proprietary API | $0.04–0.06 | Up to 4MP (Ultra) | Strong | ✓ | — | ~6–12 s | Claim → | |
Id | Frontier | Proprietary API | $0.05–0.08 | 1024×1024 (square / aspect variants) | Strong | ✓ | — | ~6–10 s | Claim → |
Rc | Frontier | Proprietary API | ~$0.04 | 2048×2048 + native SVG output | Strong | ✓ | — | ~10–20 s | Claim → |
Ad | Frontier | Proprietary API | $5–10/mo | 2048×2048 | Decent | ✓ | ✓ | ~8–15 s | Claim → |
| Frontier | Proprietary API | $12–60/mo | 1536×1536 (upscalable) | Decent | ✓ | — | ~5–15 s | Claim → | |
| Frontier | Proprietary API | $15–90/mo | 1024×1024 | Decent | ✓ | — | ~5–10 s | Claim → | |
| Cloud | Proprietary API | $0.02–0.05 | 2048×2048 | Strong | ✓ | — | ~4–10 s | Claim → | |
| Cloud | Proprietary API | $0.008–0.04 | 2048×2048 | Decent | ✓ | — | ~5–12 s | Claim → | |
| Open | Open weights | Self-host | 1024×1024 native (tile to higher) | Decent | ✓ | — | ~3–8 s on H100 | Claim → | |
| Open | Open weights | Self-host | Up to 2MP | Strong | ✓ | — | ~5–10 s on H100 | Claim → | |
| Open | Open weights | Self-host | 1024×1024 | Decent | — | — | ~1–3 s on RTX 4090 | Claim → | |
| Open | Open weights | Self-host | 2048×2048 | Decent | ✓ | — | ~6–12 s on H100 | Claim → |
Pricing as of 2026-04. Per-image cost normalised to 1024×1024 standard quality using each vendor’s published rate card; high-resolution and premium tiers cost 2–4×. Subscription products (Midjourney, Leonardo, Playground, Firefly) listed at monthly plan price — divide by your monthly volume for an effective per-image rate. Click any price to open the vendor’s pricing page. Spot an error? Tell us →
Image generation is three different markets in a trench coat: quality-first creative, production SaaS API, and self-hosted open weights. Pick by the job, not by the leaderboard.
Best aesthetic quality
Midjourney v7 · Flux Pro 1.1 Ultra
Midjourney still leads on raw aesthetic — lighting, composition, mood. Flux Pro Ultra is the closest API-accessible competitor and follows prompts more literally.
Best prompt adherence
Flux Pro 1.1 · GPT Image 1 · Imagen 4
If you need the model to count objects, place them in the right spatial relationship, and respect specific colors, these three lead. Midjourney aestheticises; these obey.
Text inside the image (signs, posters, UI)
Ideogram 2.0 · Flux Pro · Imagen 4
Ideogram is the specialist — long strings, multi-line, typographic styling. Flux Pro and Imagen 4 are the strong generalists.
Production SaaS image features (avatars, marketing)
GPT Image 1 · Imagen 4 · AWS Titan / Nova Canvas
The workhorse APIs: stable, documented, billable through your existing cloud account. GPT Image 1 wins on instruction following; Imagen 4 wins on cost.
Vector / design / brand assets
Recraft v3 · Adobe Firefly Image 3
Recraft has native SVG output and brand-style controls. Firefly slots into Creative Cloud and Express for non-technical designers.
Commercially safe (large-brand legal exposure)
Adobe Firefly Image 3 · Getty AI
Trained only on licensed and public-domain imagery, with vendor-backed legal indemnity. The only safe answer if your General Counsel reviews the model card.
Game / character / asset pipelines
Leonardo Phoenix · Flux Dev + LoRAs
Leonardo for an in-tool finetuning workflow without leaving the web. Flux Dev + LoRA via ai-toolkit / Replicate for full control over a character/style.
On-prem / self-host / on-device
SD 3.5 Large · Flux Dev · Z-Image-Turbo
SD 3.5 has the deepest tooling. Flux Dev is the highest quality. Z-Image-Turbo runs on a single 16 GB GPU — the only realistic on-device choice.
Cheapest at scale
AWS Titan v2 · Imagen 4 · Z-Image-Turbo (self-host)
Sub-$0.02/image at standard resolution. For very high volume, self-hosting Z-Image-Turbo on owned GPUs wins on unit cost — you pay in serving complexity.
Every vendor’s landing page is a curated highlight reel. Build your own 10-prompt evaluation set covering these failure modes — most providers stratify sharply on these, and the “best” model on Twitter often loses on the prompts that matter to your product:
Run each prompt 4 times per provider, judge blind, score for both quality and consistency. A model that nails a prompt 1 in 4 tries is unusable for an asset pipeline; a model that’s decent every time is gold.
‘A storefront sign that reads “Open 24/7 — espresso & pastries”.’ Most models get the layout but garble the words. Ideogram, Flux Pro, and Imagen 4 lead.
The classic AI tell. Generate a person counting on their fingers, holding scissors, or playing piano. Frontier models are mostly past this; older open-weights still mangle hands routinely.
Same character or product, 10 different scenes. Critical for asset pipelines. Vanilla APIs fail; Flux Dev + a character LoRA wins.
‘Three red apples to the left of two green pears on a wooden table.’ Tests whether the model parses the prompt or just produces a pleasant still life.
Real-person likeness is intentionally guardrailed by frontier models. Test how your provider handles named individuals — most refuse, some allow with consent flow, none should make it easy.
Mask one element, ask the model to replace it. Quality of edge blending and respect for the surrounding scene varies wildly. GPT Image 1 and Flux Fill lead.
The classic image-gen benchmarks — FID against COCO, CLIP-Score, Parti Prompts win rate — were designed when open-source diffusion models trained on LAION were the frontier. They optimise for distributional similarity to a fixed reference set, not for “does this prompt produce a good image.”
Frontier vendors don’t publish FID. Midjourney has never released a benchmark number. The leaderboards that exist (Artificial Analysis, imgsys.org, lmarena image arena) are human-vote-based — directionally useful, not authoritative.
The metrics that matter in 2026 are human preference win-rate on a held-out prompt set, prompt-adherence accuracy on compositional benchmarks like T2I-CompBench, and consistency at scale (run the same prompt 100 times, what fraction is shippable?).
Build your own eval. The comparison matrix above uses operational axes — cost, resolution, text quality, license — because no published score reliably ranks the current vendors against each other.
The standard prompt sets used in academic image-gen papers. Useful to sanity-check a new model — and to remind yourself that vendor-published examples are cherry-picked from exactly these sets.
Designed to stress-test compositional reasoning — attribute binding, spatial relations, counting, negation. The discriminator between ‘pretty pictures’ and ‘follows your prompt.’
Benchmark page →Google's benchmark from the Parti paper — broad coverage of object types, abstract concepts, world knowledge. Widely cited but heavily memorised by frontier models at this point.
Benchmark page →From the Imagen paper. Small, hand-curated, deliberately hard — colors, counts, conflicting requirements, rare objects. Still the quickest way to spot-check a new model.
Benchmark page →Compositional reasoning at scale: attribute binding, object relationships, numeracy, complex compositions. The benchmark Flux and Imagen target most directly.
Benchmark page →Stanford's Holistic Evaluation of Image Models — alignment, quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, efficiency. The closest thing to a comprehensive scoreboard.
Benchmark page →Commercial use is a legal question, not a quality question. Adobe Firefly is the only frontier model trained exclusively on licensed imagery, with vendor-backed legal indemnity. Everyone else has lawsuit exposure (Getty v. Stability is the bellwether). For a Fortune 500 brand the answer is usually Firefly; for everyone else, accept the risk knowingly.
LoRA + Flux Dev is the go-to for character or brand consistency. Train a LoRA on 20–50 reference images of your character, product, or art style (ai-toolkit, Replicate, fal). It costs ~$5 to train and unlocks the kind of consistency vanilla APIs can’t deliver at any price.
API vs subscription is a deployment decision. Midjourney is subscription-only — beautiful images, no API, no production builds. For anything you ship inside a SaaS, default to GPT Image 1, Flux Pro, or Imagen 4 because they have real APIs, real SLAs, and predictable per-image cost.
Evaluate on YOUR brand prompts, not Parti. Vendor-published examples are exactly the prompts they over-fit on. Write 10 prompts from your actual product domain — your typography, your characters, your aesthetic — and judge blind. The leaderboard winner often loses on your prompts.
Watermarking matters more than you think. Most APIs add invisible watermarks (C2PA metadata, Google SynthID, Microsoft Content Credentials) you can’t remove without quality loss. If your use case needs full anonymity (anything privacy-sensitive), check the watermark policy before you ship — and disclose to your users either way.
CodeSOTA’s text-to-image comparison is read by engineers and creative leads picking a generator for production. If you represent one of the vendors above — or one we missed — claim the listing to submit verified pricing, sample galleries, latency benchmarks, and a demo link. Free; credibility-gated, not pay-to-play.
Missing a vendor, a column we skipped, or a use case you need help picking for? Tell us — we reply within 48 hours and update the page based on what readers actually ask.
Real humans read every message. We track what people are asking for and prioritize accordingly.