Advanced LLM Parameters Guide

Master max tokens, top P, temperature, and deterministic execution for reliable AI behavior.

Overview

Advanced parameters give you fine-grained control over model behavior. Use them to optimize quality, cost, speed, and reproducibility.

Max Tokens

Max tokens controls the maximum length of model output. One token is roughly 4 characters or 0.75 words.

Use Case Guide

RangeApprox. WordsBest ForDescription
256-512~200-400Brief Q&ASingle paragraph answers, quick summaries
512-1024~400-750ChatbotsConversational responses, moderate detail
1024-2048~750-1500BalancedDefault range for most use cases
2048-4096~1500-3000DocumentationDetailed explanations and tutorials
4096-8192~3000-6000AnalysisComprehensive long-form analysis
8192-16384~6000-12000Agent TasksMulti-step reasoning and complex workflows

Model-Specific Limits

Different models support different maximum output lengths. Click column headers to sort.

Model Max Output Max Input ContextProvider
GPT-5.2128,000400,000OpenAI
Claude Opus 4.6128,000200,000Anthropic
Gemini 2.5 Pro65,5361,000,000Google
Claude Sonnet 4.564,000200,000Anthropic
GPT-4.132,7681,000,000OpenAI
GPT-4o16,384128,000OpenAI
GPT-4o-mini16,384128,000OpenAI
Claude 3.5 Sonnet8,192200,000Anthropic
Gemini 2.0 Flash8,1921,000,000Google
Gemini 1.5 Pro8,1922,000,000Google
GPT-4 Turbo4,096128,000OpenAI
Claude 3 Opus4,096200,000Anthropic

Cost & Latency Impact

Higher Cost

Output tokens often cost more than input tokens. Large responses can significantly increase total API spend.

Longer Wait

More generated tokens increase latency, especially for streaming responses.

Better Completeness

Higher limits reduce the risk of responses being cut off mid-thought.

Start Low

Start around 1024-2048 and increase only when you observe truncation.

Top P (Nucleus Sampling)

Top P controls randomness by limiting candidate tokens to a cumulative probability mass.

How It Works

Model token probabilities:
Token A: 40%
Token B: 30%
Token C: 20%
Token D: 5%
Token E: 3%
Token F: 2%

With Top P = 0.9, the model samples from A-E.

Value Guide

ValueBehaviorBest For
0.1Ultra-focusedFactual Q&A, data extraction
0.3-0.5DeterministicTechnical docs, code generation
0.7-0.9BalancedCreative writing, brainstorming
0.95-1.0Maximum varietyStory writing, ideation

Top P vs Temperature

AspectTemperatureTop P
MethodScales all probabilitiesFilters low-probability tokens
DynamicNoYes (adapts to confidence)
Extreme valuesCan produce nonsenseSafer boundaries
Use withSet one OR the otherSet one OR the other
RecommendedMost usersAdvanced users

Seed (Deterministic Execution)

A seed locks sampling randomness. With identical inputs and parameters, the same seed yields consistent outputs.

Why Deterministic Mode?

A/B Testing

Compare prompt variants without randomness as a confounder.

Certification

Approve outputs and keep behavior stable in production workflows.

Debugging

Reproduce errors reliably with identical input and seed.

Compliance

Support reproducible behavior requirements in regulated domains.

How It Works

Seed: 882194
Input: "Write a haiku about AI"

Output (every time):
Silicon neurons
Learning from endless data
Creativity blooms

PromptOps Workflow

  1. Create prompt and enable deterministic mode
  2. Execute prompt with locked parameters
  3. Review output quality
  4. Certify approved output (gold snapshot)
  5. Validate future runs against snapshot
  6. Detect drift when outputs differ

Use Case Matrix

Recommended parameter combinations for common scenarios:

Use CaseTempMax TokensTop PSeedWhy
Chatbot0.710241.0NoConversational and varied
Documentation0.340960.7NoFactual and detailed
Code Generation0.240960.5NoDeterministic and focused
Creative Writing0.981920.95NoDiverse long-form outputs
Data Extraction05120.1NoPrecise and brief
A/B Testing020481.0YesIsolate prompt changes
Certification020481.0YesReproducible compliance
Agent Tasks0.5163840.8OptionalComplex long-running tasks

Frequently Asked Questions

Why can't I set max tokens to 100,000?

Model and provider limits cap maximum output. For very long output, split work across multiple requests.

What's the difference between temperature and Top P?

Both control randomness: temperature rescales probabilities; Top P truncates the distribution by cumulative mass.

Do I always need a seed?

No. Use seed when reproducibility is required, such as testing, debugging, or compliance workflows.

Why can output change even with a seed?

Model version changes, parameter drift, or input differences can all change outputs despite the same seed.

What seed value should I use?

Any integer works. The value does not improve quality; it only controls reproducibility.

Can I use higher max tokens for agent tasks?

Yes, but monitor cost and latency. Test lower values first and increase only when needed.

Which providers support deterministic execution?

Support varies by model family and provider. Confirm current deterministic behavior in provider docs before production use.

Best Practices

Start Conservative

Begin with moderate defaults and tune based on measured outcomes.

Monitor Costs

Higher output limits raise costs. Apply tighter limits in development and testing.

Test Systematically

Change one parameter at a time to isolate its effect on output behavior.

Match to Use Case

Use low randomness for factual tasks and higher randomness for ideation tasks.

Check for Truncation

Increase max tokens when responses end abruptly or lose completeness.

Version Lock for Production

Pin exact model versions instead of aliases for stable behavior across releases.

Technical Deep Dive

How Nucleus Sampling Works

Top P removes the low-probability tail and samples from the remaining mass.

Probabilities: [0.5, 0.3, 0.1, 0.05, 0.04, 0.01]
Cumulative:    [0.5, 0.8, 0.9, 0.95, 0.99, 1.0]

Top P = 0.9
Considered: [0.5, 0.3, 0.1]
Excluded:  [0.05, 0.04, 0.01]

This adapts to confidence: focused when certainty is high, broader when uncertainty is high.

Deterministic Execution Under the Hood

function generate(prompt, seed) {
  rng = RandomGenerator(seed)
  for position in output {
    probabilities = model.predict_next_token(context)
    token = rng.sample(probabilities)
    output.append(token)
  }
}

With low temperature and fixed seed, outputs become highly repeatable for identical inputs.

PromptOps Checksum Validation

const config = {
  model: "gpt-4o-2024-08-06",
  temperature: 0,
  maxTokens: 2048,
  topP: 1.0,
  seed: 882194
};

const checksum = sha256(JSON.stringify(config));

Ready to Take Control?

Open Markdown Studio and tune advanced parameters with deterministic workflows.