Home/Building Blocks/Controllable Generation

Text→Text

Controllable Generation

Generate text with constraints on style, length, structure, or safety guardrails.

How Controllable Text Generation Works

Language models generate text probabilistically. Controllable generation is the art of steering that randomness toward specific styles, formats, and structures while preserving fluency and coherence.

1. Interactive Demo 2. Control Types 3. Methods 4. Format Constraints 5. Code Examples

The Problem

A raw language model is like a firehose of text. It produces fluent output, but you have limited control over what comes out.

Without Control

- Output length is unpredictable
- Tone shifts mid-response
- Format varies between calls
- Style inconsistent with brand
- JSON might be malformed
- May ignore instructions

With Control

- Consistent response length
- Stable, predictable tone
- Reliable output format
- On-brand voice every time
- Guaranteed valid JSON
- Follows constraints precisely

The Core Insight

Control happens at two levels: soft control influences the probability distribution through prompts and training, while hard control constrains which tokens can be generated at all. The best systems combine both.

Interactive: See Control in Action

Adjust the sliders to see how different control parameters affect both the system prompt and the generated output. This demonstrates soft control through prompting.

Control Parameters

FormalityNeutral

CasualNeutralFormal

LengthMedium

BriefMediumDetailed

ToneNeutral

CriticalBalancedPositive

Creativity (Temperature)Balanced

t=0.0t=0.5t=1.0

Generated System Prompt

You are a product reviewer. Use balanced, standard language. Your perspective should be balanced, objective. Use moderate length (2-3 sentences)

Example Output

Functional but unremarkable. Does the job.

Note: In production, you'd pass temperature=0.5 and max_tokens based on length preference.

Types of Control

Different aspects of generation can be controlled. Understanding what you can control helps you choose the right technique.

Style Control

How the text sounds

- Formal vs casual
- Technical vs simple
- Verbose vs concise
- Brand voice

Method: Prompting, Fine-tuning

Length Control

How much text is generated

- Token limits
- Sentence counts
- Character bounds
- Paragraph structure

Method: max_tokens, Prompting

Format Control

Structure of the output

- JSON/XML/YAML
- Markdown
- Code blocks
- Tables, lists

Method: Constrained decoding

Content Control

What topics and facts appear

- Topic focus
- Required entities
- Excluded content
- Factual grounding

Method: RAG, Prompting, RLHF

Soft Control (Probabilistic)

Influences the model's probability distribution. The model is more likely to follow instructions but not guaranteed.

- System prompts

- Few-shot examples

- Temperature adjustment

- Fine-tuning

Hard Control (Guaranteed)

Mathematically constrains which tokens can appear. Output structure is guaranteed to match specification.

- JSON schema enforcement

- Regex patterns

- Grammar constraints

- Logit bias/masking

Control Methods Compared

From simple prompting to constrained decoding, each method offers different tradeoffs between reliability, effort, and flexibility.

70%

Prompting

85%

Fine-Tuning

90%

RLHF / DPO

100%

Constrained Decoding

95%

Guidance Templates

Reliability

Implementation Effort

Control output through natural language instructions in the system prompt or user message

Pros:

+ No training required
+ Flexible
+ Works with any model
+ Easy to iterate

Cons:

- Instructions may be ignored
- Less precise control
- Uses context window

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": """You are a formal business writer.
            - Use professional language
            - Avoid contractions
            - Keep responses under 100 words
            - Structure with bullet points"""
        },
        {
            "role": "user",
            "content": "Describe the benefits of cloud computing."
        }
    ]
)

Format Constraints: Guaranteed Structure

Sometimes you need absolute certainty about output format. Constrained decoding achieves this by modifying the model's logits during generation.

JSON Schema

Enforce structured output matching a specific schema

INPUT:

Extract the person's name, age, and occupation

CONSTRAINED OUTPUT:

{
  "name": "Alice Chen",
  "age": 28,
  "occupation": "Software Engineer"
}

Implemented via: OpenAI Structured Outputs or Outlines

How Constrained Decoding Works

1. Define Schema

Specify JSON Schema, regex, or grammar

2. Build FSM

Convert schema to finite-state machine

3. Mask Logits

Set invalid token logits to -infinity

4. Sample

Only valid tokens can be selected

# Logit masking example (conceptual)

logits = [0.2, 0.5, 0.1, 0.8, ...] # Raw model output

mask = [True, False, True, False, ...] # Valid tokens from FSM

logits[~mask] = -inf # Invalid tokens cannot be sampled

Production Code Examples

Three approaches to controllable generation, from simple prompting to full structural guarantees.

controlled_generation.pypip install openai

from openai import OpenAI

client = OpenAI()

# Style control through system prompt
response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0.7,  # Creativity control
    max_tokens=150,   # Length control
    messages=[
        {
            "role": "system",
            "content": """You are a technical writer. Follow these guidelines:

            STYLE:
            - Use formal, professional language
            - Avoid first person pronouns
            - Be precise and concise

            FORMAT:
            - Start with a one-sentence summary
            - Use bullet points for lists
            - End with a key takeaway

            TONE:
            - Objective and balanced
            - Evidence-based claims only"""
        },
        {
            "role": "user",
            "content": "Explain the benefits of containerization."
        }
    ]
)

print(response.choices[0].message.content)