Advanced LLM Parameters Guide
Master max tokens, top P, temperature, and deterministic execution for reliable AI behavior.
Overview
Advanced parameters give you fine-grained control over model behavior. Use them to optimize quality, cost, speed, and reproducibility.
Max Tokens
Max tokens controls the maximum length of model output. One token is roughly 4 characters or 0.75 words.
Use Case Guide
| Range | Approx. Words | Best For | Description |
|---|---|---|---|
| 256-512 | ~200-400 | Brief Q&A | Single paragraph answers, quick summaries |
| 512-1024 | ~400-750 | Chatbots | Conversational responses, moderate detail |
| 1024-2048 | ~750-1500 | Balanced | Default range for most use cases |
| 2048-4096 | ~1500-3000 | Documentation | Detailed explanations and tutorials |
| 4096-8192 | ~3000-6000 | Analysis | Comprehensive long-form analysis |
| 8192-16384 | ~6000-12000 | Agent Tasks | Multi-step reasoning and complex workflows |
Model-Specific Limits
Different models support different maximum output lengths. Click column headers to sort.
| Model | Max Output | Max Input Context | Provider |
|---|---|---|---|
| GPT-5.2 | 128,000 | 400,000 | OpenAI |
| Claude Opus 4.6 | 128,000 | 200,000 | Anthropic |
| Gemini 2.5 Pro | 65,536 | 1,000,000 | |
| Claude Sonnet 4.5 | 64,000 | 200,000 | Anthropic |
| GPT-4.1 | 32,768 | 1,000,000 | OpenAI |
| GPT-4o | 16,384 | 128,000 | OpenAI |
| GPT-4o-mini | 16,384 | 128,000 | OpenAI |
| Claude 3.5 Sonnet | 8,192 | 200,000 | Anthropic |
| Gemini 2.0 Flash | 8,192 | 1,000,000 | |
| Gemini 1.5 Pro | 8,192 | 2,000,000 | |
| GPT-4 Turbo | 4,096 | 128,000 | OpenAI |
| Claude 3 Opus | 4,096 | 200,000 | Anthropic |
Cost & Latency Impact
Higher Cost
Output tokens often cost more than input tokens. Large responses can significantly increase total API spend.
Longer Wait
More generated tokens increase latency, especially for streaming responses.
Better Completeness
Higher limits reduce the risk of responses being cut off mid-thought.
Start Low
Start around 1024-2048 and increase only when you observe truncation.
Top P (Nucleus Sampling)
Top P controls randomness by limiting candidate tokens to a cumulative probability mass.
How It Works
Model token probabilities: Token A: 40% Token B: 30% Token C: 20% Token D: 5% Token E: 3% Token F: 2% With Top P = 0.9, the model samples from A-E.
Value Guide
| Value | Behavior | Best For |
|---|---|---|
| 0.1 | Ultra-focused | Factual Q&A, data extraction |
| 0.3-0.5 | Deterministic | Technical docs, code generation |
| 0.7-0.9 | Balanced | Creative writing, brainstorming |
| 0.95-1.0 | Maximum variety | Story writing, ideation |
Top P vs Temperature
| Aspect | Temperature | Top P |
|---|---|---|
| Method | Scales all probabilities | Filters low-probability tokens |
| Dynamic | No | Yes (adapts to confidence) |
| Extreme values | Can produce nonsense | Safer boundaries |
| Use with | Set one OR the other | Set one OR the other |
| Recommended | Most users | Advanced users |
Seed (Deterministic Execution)
A seed locks sampling randomness. With identical inputs and parameters, the same seed yields consistent outputs.
Why Deterministic Mode?
A/B Testing
Compare prompt variants without randomness as a confounder.
Certification
Approve outputs and keep behavior stable in production workflows.
Debugging
Reproduce errors reliably with identical input and seed.
Compliance
Support reproducible behavior requirements in regulated domains.
How It Works
Seed: 882194 Input: "Write a haiku about AI" Output (every time): Silicon neurons Learning from endless data Creativity blooms
PromptOps Workflow
- Create prompt and enable deterministic mode
- Execute prompt with locked parameters
- Review output quality
- Certify approved output (gold snapshot)
- Validate future runs against snapshot
- Detect drift when outputs differ
Use Case Matrix
Recommended parameter combinations for common scenarios:
| Use Case | Temp | Max Tokens | Top P | Seed | Why |
|---|---|---|---|---|---|
| Chatbot | 0.7 | 1024 | 1.0 | No | Conversational and varied |
| Documentation | 0.3 | 4096 | 0.7 | No | Factual and detailed |
| Code Generation | 0.2 | 4096 | 0.5 | No | Deterministic and focused |
| Creative Writing | 0.9 | 8192 | 0.95 | No | Diverse long-form outputs |
| Data Extraction | 0 | 512 | 0.1 | No | Precise and brief |
| A/B Testing | 0 | 2048 | 1.0 | Yes | Isolate prompt changes |
| Certification | 0 | 2048 | 1.0 | Yes | Reproducible compliance |
| Agent Tasks | 0.5 | 16384 | 0.8 | Optional | Complex long-running tasks |
Frequently Asked Questions
Why can't I set max tokens to 100,000?
Model and provider limits cap maximum output. For very long output, split work across multiple requests.
What's the difference between temperature and Top P?
Both control randomness: temperature rescales probabilities; Top P truncates the distribution by cumulative mass.
Do I always need a seed?
No. Use seed when reproducibility is required, such as testing, debugging, or compliance workflows.
Why can output change even with a seed?
Model version changes, parameter drift, or input differences can all change outputs despite the same seed.
What seed value should I use?
Any integer works. The value does not improve quality; it only controls reproducibility.
Can I use higher max tokens for agent tasks?
Yes, but monitor cost and latency. Test lower values first and increase only when needed.
Which providers support deterministic execution?
Support varies by model family and provider. Confirm current deterministic behavior in provider docs before production use.
Best Practices
Start Conservative
Begin with moderate defaults and tune based on measured outcomes.
Monitor Costs
Higher output limits raise costs. Apply tighter limits in development and testing.
Test Systematically
Change one parameter at a time to isolate its effect on output behavior.
Match to Use Case
Use low randomness for factual tasks and higher randomness for ideation tasks.
Check for Truncation
Increase max tokens when responses end abruptly or lose completeness.
Version Lock for Production
Pin exact model versions instead of aliases for stable behavior across releases.
Technical Deep Dive
How Nucleus Sampling Works
Top P removes the low-probability tail and samples from the remaining mass.
Probabilities: [0.5, 0.3, 0.1, 0.05, 0.04, 0.01] Cumulative: [0.5, 0.8, 0.9, 0.95, 0.99, 1.0] Top P = 0.9 Considered: [0.5, 0.3, 0.1] Excluded: [0.05, 0.04, 0.01]
This adapts to confidence: focused when certainty is high, broader when uncertainty is high.
Deterministic Execution Under the Hood
function generate(prompt, seed) {
rng = RandomGenerator(seed)
for position in output {
probabilities = model.predict_next_token(context)
token = rng.sample(probabilities)
output.append(token)
}
}With low temperature and fixed seed, outputs become highly repeatable for identical inputs.
PromptOps Checksum Validation
const config = {
model: "gpt-4o-2024-08-06",
temperature: 0,
maxTokens: 2048,
topP: 1.0,
seed: 882194
};
const checksum = sha256(JSON.stringify(config));Ready to Take Control?
Open Markdown Studio and tune advanced parameters with deterministic workflows.