Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Text-to-ImageTasks/Multimodal/Text-to-Image
Multimodal · the most fragmented vendor market

Text-to-Image.

Type a prompt, get an image. The most consumer-visible AI category and the most fragmented vendor market — buyers Google “best AI image generator” relentlessly and the honest answer is “depends on the job.”

Below: 14 providers compared on cost per image, max resolution, text-in-image rendering, style control, editing, commercial-safe training, and license.

All tasks Claim a listing
§ 01 · The matrix

14 providers, side by side.

Frontier API · hyperscaler cloud · open weights. Per-image cost normalised to a standard 1024×1024 render.

Provider / ModelTierLicenseCost / imageMax resText-in-imgEditComm-safeSpeed
Mj
Midjourney
Midjourney v7
FrontierProprietary API$10–120/mo2048×2048 (upscale to ~4K)Decent~30–60 sClaim →
OpenAI logo
OpenAI
GPT Image 1 · DALL-E 3
FrontierProprietary API$0.04–0.171024×1024 / 1792×1024Strong~10–25 sClaim →
BF
Black Forest Labs
Flux Pro 1.1 · Flux Pro Ultra
FrontierProprietary API$0.04–0.06Up to 4MP (Ultra)Strong~6–12 sClaim →
Id
Ideogram
Ideogram 2.0 · 2.0 Turbo
FrontierProprietary API$0.05–0.081024×1024 (square / aspect variants)Strong~6–10 sClaim →
Rc
Recraft
Recraft v3
FrontierProprietary API~$0.042048×2048 + native SVG outputStrong~10–20 sClaim →
Ad
Adobe
Firefly Image 3 · Image Model 4
FrontierProprietary API$5–10/mo2048×2048Decent~8–15 sClaim →
Le
Leonardo.Ai
Phoenix · Lightning XL · Kino XL
FrontierProprietary API$12–60/mo1536×1536 (upscalable)Decent~5–15 sClaim →
Pl
Playground
Playground v3
FrontierProprietary API$15–90/mo1024×1024Decent~5–10 sClaim →
Google Cloud logo
Google Cloud
Imagen 4 · Imagen 4 Ultra (Vertex AI)
CloudProprietary API$0.02–0.052048×2048Strong~4–10 sClaim →
AWS
Amazon Web Services
Titan Image G1 v2 · Nova Canvas
CloudProprietary API$0.008–0.042048×2048Decent~5–12 sClaim →
SD
Stability AI (open)
Stable Diffusion 3.5 Large · Large Turbo
OpenOpen weightsSelf-host1024×1024 native (tile to higher)Decent~3–8 s on H100Claim →
BF
Black Forest Labs (open)
FLUX.1-dev (open weights)
OpenOpen weightsSelf-hostUp to 2MPStrong~5–10 s on H100Claim →
Alibaba (open) logo
Alibaba (open)
Z-Image-Turbo (6B distilled)
OpenOpen weightsSelf-host1024×1024Decent~1–3 s on RTX 4090Claim →
Hy
Tencent (open)
HunyuanImage 3
OpenOpen weightsSelf-host2048×2048Decent~6–12 s on H100Claim →

Pricing as of 2026-04. Per-image cost normalised to 1024×1024 standard quality using each vendor’s published rate card; high-resolution and premium tiers cost 2–4×. Subscription products (Midjourney, Leonardo, Playground, Firefly) listed at monthly plan price — divide by your monthly volume for an effective per-image rate. Click any price to open the vendor’s pricing page. Spot an error? Tell us →

§ 02 · Decision shortcuts

Which should I use?

Image generation is three different markets in a trench coat: quality-first creative, production SaaS API, and self-hosted open weights. Pick by the job, not by the leaderboard.

Best aesthetic quality

Midjourney v7 · Flux Pro 1.1 Ultra

Midjourney still leads on raw aesthetic — lighting, composition, mood. Flux Pro Ultra is the closest API-accessible competitor and follows prompts more literally.

Best prompt adherence

Flux Pro 1.1 · GPT Image 1 · Imagen 4

If you need the model to count objects, place them in the right spatial relationship, and respect specific colors, these three lead. Midjourney aestheticises; these obey.

Text inside the image (signs, posters, UI)

Ideogram 2.0 · Flux Pro · Imagen 4

Ideogram is the specialist — long strings, multi-line, typographic styling. Flux Pro and Imagen 4 are the strong generalists.

Production SaaS image features (avatars, marketing)

GPT Image 1 · Imagen 4 · AWS Titan / Nova Canvas

The workhorse APIs: stable, documented, billable through your existing cloud account. GPT Image 1 wins on instruction following; Imagen 4 wins on cost.

Vector / design / brand assets

Recraft v3 · Adobe Firefly Image 3

Recraft has native SVG output and brand-style controls. Firefly slots into Creative Cloud and Express for non-technical designers.

Commercially safe (large-brand legal exposure)

Adobe Firefly Image 3 · Getty AI

Trained only on licensed and public-domain imagery, with vendor-backed legal indemnity. The only safe answer if your General Counsel reviews the model card.

Game / character / asset pipelines

Leonardo Phoenix · Flux Dev + LoRAs

Leonardo for an in-tool finetuning workflow without leaving the web. Flux Dev + LoRA via ai-toolkit / Replicate for full control over a character/style.

On-prem / self-host / on-device

SD 3.5 Large · Flux Dev · Z-Image-Turbo

SD 3.5 has the deepest tooling. Flux Dev is the highest quality. Z-Image-Turbo runs on a single 16 GB GPU — the only realistic on-device choice.

Cheapest at scale

AWS Titan v2 · Imagen 4 · Z-Image-Turbo (self-host)

Sub-$0.02/image at standard resolution. For very high volume, self-hosting Z-Image-Turbo on owned GPUs wins on unit cost — you pay in serving complexity.

§ 03 · Methodology

What to actually test (vendor galleries lie).

Every vendor’s landing page is a curated highlight reel. Build your own 10-prompt evaluation set covering these failure modes — most providers stratify sharply on these, and the “best” model on Twitter often loses on the prompts that matter to your product:

Run each prompt 4 times per provider, judge blind, score for both quality and consistency. A model that nails a prompt 1 in 4 tries is unusable for an asset pipeline; a model that’s decent every time is gold.

Text rendering

‘A storefront sign that reads “Open 24/7 — espresso & pastries”.’ Most models get the layout but garble the words. Ideogram, Flux Pro, and Imagen 4 lead.

Hands and anatomy

The classic AI tell. Generate a person counting on their fingers, holding scissors, or playing piano. Frontier models are mostly past this; older open-weights still mangle hands routinely.

Style consistency across batches

Same character or product, 10 different scenes. Critical for asset pipelines. Vanilla APIs fail; Flux Dev + a character LoRA wins.

Prompt adherence (counts, colors, spatial)

‘Three red apples to the left of two green pears on a wooden table.’ Tests whether the model parses the prompt or just produces a pleasant still life.

Faces and likeness

Real-person likeness is intentionally guardrailed by frontier models. Test how your provider handles named individuals — most refuse, some allow with consent flow, none should make it easy.

Editing precision (inpaint / mask)

Mask one element, ask the model to replace it. Quality of edge blending and respect for the surrounding scene varies wildly. GPT Image 1 and Flux Fill lead.

§ 04 · Metrics

Why image-gen benchmarks lag the production frontier.

The classic image-gen benchmarks — FID against COCO, CLIP-Score, Parti Prompts win rate — were designed when open-source diffusion models trained on LAION were the frontier. They optimise for distributional similarity to a fixed reference set, not for “does this prompt produce a good image.”

Frontier vendors don’t publish FID. Midjourney has never released a benchmark number. The leaderboards that exist (Artificial Analysis, imgsys.org, lmarena image arena) are human-vote-based — directionally useful, not authoritative.

The metrics that matter in 2026 are human preference win-rate on a held-out prompt set, prompt-adherence accuracy on compositional benchmarks like T2I-CompBench, and consistency at scale (run the same prompt 100 times, what fraction is shippable?).

Build your own eval. The comparison matrix above uses operational axes — cost, resolution, text quality, license — because no published score reliably ranks the current vendors against each other.

§ 05 · Reference benchmarks

The boards that matter.

The standard prompt sets used in academic image-gen papers. Useful to sanity-check a new model — and to remind yourself that vendor-published examples are cherry-picked from exactly these sets.

GenAI-Bench

1,600 prompts · 16 compositional skills2024

Designed to stress-test compositional reasoning — attribute binding, spatial relations, counting, negation. The discriminator between ‘pretty pictures’ and ‘follows your prompt.’

Benchmark page →

Parti Prompts

1,632 English prompts · 12 categories2022

Google's benchmark from the Parti paper — broad coverage of object types, abstract concepts, world knowledge. Widely cited but heavily memorised by frontier models at this point.

Benchmark page →

DrawBench

200 challenging prompts · 11 categories2022

From the Imagen paper. Small, hand-curated, deliberately hard — colors, counts, conflicting requirements, rare objects. Still the quickest way to spot-check a new model.

Benchmark page →

T2I-CompBench

6,000 prompts · 6 composition categories2023

Compositional reasoning at scale: attribute binding, object relationships, numeracy, complex compositions. The benchmark Flux and Imagen target most directly.

Benchmark page →

HEIM (Stanford CRFM)

Holistic eval · 12 aspects · multiple datasets2023

Stanford's Holistic Evaluation of Image Models — alignment, quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, efficiency. The closest thing to a comprehensive scoreboard.

Benchmark page →
§ 06 · Practical tips

Five rules for shipping image gen in 2026.

Commercial use is a legal question, not a quality question. Adobe Firefly is the only frontier model trained exclusively on licensed imagery, with vendor-backed legal indemnity. Everyone else has lawsuit exposure (Getty v. Stability is the bellwether). For a Fortune 500 brand the answer is usually Firefly; for everyone else, accept the risk knowingly.

LoRA + Flux Dev is the go-to for character or brand consistency. Train a LoRA on 20–50 reference images of your character, product, or art style (ai-toolkit, Replicate, fal). It costs ~$5 to train and unlocks the kind of consistency vanilla APIs can’t deliver at any price.

API vs subscription is a deployment decision. Midjourney is subscription-only — beautiful images, no API, no production builds. For anything you ship inside a SaaS, default to GPT Image 1, Flux Pro, or Imagen 4 because they have real APIs, real SLAs, and predictable per-image cost.

Evaluate on YOUR brand prompts, not Parti. Vendor-published examples are exactly the prompts they over-fit on. Write 10 prompts from your actual product domain — your typography, your characters, your aesthetic — and judge blind. The leaderboard winner often loses on your prompts.

Watermarking matters more than you think. Most APIs add invisible watermarks (C2PA metadata, Google SynthID, Microsoft Content Credentials) you can’t remove without quality loss. If your use case needs full anonymity (anything privacy-sensitive), check the watermark policy before you ship — and disclose to your users either way.

For vendors

Run an image-gen product? Claim your listing.

CodeSOTA’s text-to-image comparison is read by engineers and creative leads picking a generator for production. If you represent one of the vendors above — or one we missed — claim the listing to submit verified pricing, sample galleries, latency benchmarks, and a demo link. Free; credibility-gated, not pay-to-play.

Claim a listing Get a rank badge for your site
Related comparisons
Visual Question Answering Image Captioning Text-to-Speech Frontier LLM leaderboard
Reply within 48 hours · No newsletter

What were you looking for on text-to-image?

Missing a vendor, a column we skipped, or a use case you need help picking for? Tell us — we reply within 48 hours and update the page based on what readers actually ask.

Real humans read every message. We track what people are asking for and prioritize accordingly.