Home/Building Blocks/Controllable Generation
TextText

Controllable Generation

Generate text with constraints on style, length, structure, or safety guardrails.

How Controllable Text Generation Works

Language models generate text probabilistically. Controllable generation is the art of steering that randomness toward specific styles, formats, and structures while preserving fluency and coherence.

?

The Problem

A raw language model is like a firehose of text. It produces fluent output, but you have limited control over what comes out.

Without Control

  • - Output length is unpredictable
  • - Tone shifts mid-response
  • - Format varies between calls
  • - Style inconsistent with brand
  • - JSON might be malformed
  • - May ignore instructions

With Control

  • - Consistent response length
  • - Stable, predictable tone
  • - Reliable output format
  • - On-brand voice every time
  • - Guaranteed valid JSON
  • - Follows constraints precisely

The Core Insight

Control happens at two levels: soft control influences the probability distribution through prompts and training, while hard control constrains which tokens can be generated at all. The best systems combine both.

1

Interactive: See Control in Action

Adjust the sliders to see how different control parameters affect both the system prompt and the generated output. This demonstrates soft control through prompting.

Control Parameters

Neutral
CasualNeutralFormal
Medium
BriefMediumDetailed
Neutral
CriticalBalancedPositive
Balanced
t=0.0t=0.5t=1.0

Generated System Prompt

You are a product reviewer. Use balanced, standard language. Your perspective should be balanced, objective. Use moderate length (2-3 sentences)

Example Output

Functional but unremarkable. Does the job.
Note: In production, you'd pass temperature=0.5 and max_tokens based on length preference.
2

Types of Control

Different aspects of generation can be controlled. Understanding what you can control helps you choose the right technique.

Style Control

How the text sounds

  • - Formal vs casual
  • - Technical vs simple
  • - Verbose vs concise
  • - Brand voice
Method: Prompting, Fine-tuning
Length Control

How much text is generated

  • - Token limits
  • - Sentence counts
  • - Character bounds
  • - Paragraph structure
Method: max_tokens, Prompting
Format Control

Structure of the output

  • - JSON/XML/YAML
  • - Markdown
  • - Code blocks
  • - Tables, lists
Method: Constrained decoding
Content Control

What topics and facts appear

  • - Topic focus
  • - Required entities
  • - Excluded content
  • - Factual grounding
Method: RAG, Prompting, RLHF

Soft Control (Probabilistic)

Influences the model's probability distribution. The model is more likely to follow instructions but not guaranteed.

- System prompts
- Few-shot examples
- Temperature adjustment
- Fine-tuning

Hard Control (Guaranteed)

Mathematically constrains which tokens can appear. Output structure is guaranteed to match specification.

- JSON schema enforcement
- Regex patterns
- Grammar constraints
- Logit bias/masking
3

Control Methods Compared

From simple prompting to constrained decoding, each method offers different tradeoffs between reliability, effort, and flexibility.

70%
Prompting
85%
Fine-Tuning
90%
RLHF / DPO
100%
Constrained Decoding
95%
Guidance Templates
Reliability
Implementation Effort

Control output through natural language instructions in the system prompt or user message

Pros:
  • + No training required
  • + Flexible
  • + Works with any model
  • + Easy to iterate
Cons:
  • - Instructions may be ignored
  • - Less precise control
  • - Uses context window
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": """You are a formal business writer.
            - Use professional language
            - Avoid contractions
            - Keep responses under 100 words
            - Structure with bullet points"""
        },
        {
            "role": "user",
            "content": "Describe the benefits of cloud computing."
        }
    ]
)
4

Format Constraints: Guaranteed Structure

Sometimes you need absolute certainty about output format. Constrained decoding achieves this by modifying the model's logits during generation.

JSON Schema

Enforce structured output matching a specific schema

INPUT:
Extract the person's name, age, and occupation
CONSTRAINED OUTPUT:
{
  "name": "Alice Chen",
  "age": 28,
  "occupation": "Software Engineer"
}
Implemented via: OpenAI Structured Outputs or Outlines

How Constrained Decoding Works

1. Define Schema
Specify JSON Schema, regex, or grammar
2. Build FSM
Convert schema to finite-state machine
3. Mask Logits
Set invalid token logits to -infinity
4. Sample
Only valid tokens can be selected
# Logit masking example (conceptual)
logits = [0.2, 0.5, 0.1, 0.8, ...] # Raw model output
mask = [True, False, True, False, ...] # Valid tokens from FSM
logits[~mask] = -inf # Invalid tokens cannot be sampled
5

Production Code Examples

Three approaches to controllable generation, from simple prompting to full structural guarantees.

controlled_generation.pypip install openai
from openai import OpenAI

client = OpenAI()

# Style control through system prompt
response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.7,  # Creativity control
    max_tokens=150,   # Length control
    messages=[
        {
            "role": "system",
            "content": """You are a technical writer. Follow these guidelines:

            STYLE:
            - Use formal, professional language
            - Avoid first person pronouns
            - Be precise and concise

            FORMAT:
            - Start with a one-sentence summary
            - Use bullet points for lists
            - End with a key takeaway

            TONE:
            - Objective and balanced
            - Evidence-based claims only"""
        },
        {
            "role": "user",
            "content": "Explain the benefits of containerization."
        }
    ]
)

print(response.choices[0].message.content)

Use Prompting When...

  • - Quick iteration needed
  • - Style control is primary goal
  • - Using API-only models
  • - Structure is flexible

Use Outlines When...

  • - JSON output must be valid
  • - Using local/open models
  • - Schema changes rarely
  • - No room for format errors

Use Guidance When...

  • - Complex multi-step generation
  • - Need interleaved logic
  • - Building agents/tools
  • - Want readable templates

The Control Spectrum

Raw Generation
->
Prompting
->
Fine-Tuning
->
RLHF
->
Guidance
->
Constrained Decoding

Start simple. Prompting solves 80% of control problems with zero infrastructure. Only move to harder techniques when you need guarantees.

Layer your controls. Use prompting for style, temperature for creativity, and constrained decoding for structure. They combine well.

Use Cases

  • Brand-safe copy
  • Structured outputs
  • Policy-guided responses
  • Style transfer

Architectural Patterns

Control Tokens

Use control codes or adapters for style/length.

Constrained Decoding

Beam/CFG guided decoding with regex or JSON schemas.

Guardrails + Filters

Post-process with safety/policy models.

Implementations

Open Source

Guidance/Outlines

MIT
Open Source

Programmatic constrained decoding.

Guardrails AI

Apache 2.0
Open Source

Policy + output schema enforcement.

NeMo Guardrails

Apache 2.0
Open Source

LLM guardrails with YAML policies.

Benchmarks

Quick Facts

Input
Text
Output
Text
Implementations
3 open source, 0 API
Patterns
3 approaches

Have benchmark data?

Help us track the state of the art for controllable generation.

Submit Results