How do I count tokens for GPT-4 or ChatGPT?

Markdown Studio automatically counts tokens as you type using the cl100k_base tokenizer approximation (same as GPT-4, GPT-4o, and ChatGPT). Simply write or paste your text and see the token count update in real-time.

Does this token counter work for Claude?

Yes! Markdown Studio supports token counting for the full Claude family including Claude 4.5 and Claude Opus 4.6. Select any Claude model from the dropdown to see accurate token estimates. Also supports GPT-5, Gemini 3, and Llama 4.

Is this markdown editor free to use?

Yes, Markdown Studio is completely free with no login required. All features including token counting, PDF export, and Mermaid diagrams are available at no cost.

What is a context window in AI models?

A context window is the maximum number of tokens an AI model can process in a single request. Llama 4 Scout leads with 10M tokens, Gemini 3 offers 2M, GPT-4.1 and Grok-3 support 1M, and Claude 4.5 handles 200K. Markdown Studio shows your usage percentage for each model.

What are smart variables in Markdown Studio?

Smart variables let you create reusable placeholders in your markdown that can be filled dynamically. Perfect for AI prompts, templates, and documents with repeated values. Variables appear with visual gutter icons and support presets for common use cases.

Can I collaborate with AI agents in Markdown Studio?

Yes! Markdown Studio supports XML/AI prompt tag autocomplete with 23+ specialized tags for Claude, GPT, and other LLMs, smart variables with presets for dynamic templates, and smart copy formats optimized for AI consumption.

What tokenizer does GPT-4.1 use?

GPT-4.1 uses the o200k_base tokenizer, which has an extended vocabulary (~200,000 tokens) compared to cl100k_base (~100,000 tokens) used by GPT-4 and GPT-4 Turbo. The o200k_base tokenizer is also used by GPT-5, o1, o3, and o4 reasoning models. Markdown Studio supports both tokenizers for accurate token counting.

How do I count tokens for o1, o3, or o4 reasoning models?

Reasoning models (o1, o3, o4) use the o200k_base tokenizer and have two types of tokens: input tokens (your prompt) and thinking tokens (internal reasoning). Markdown Studio counts the input tokens using o200k_base. Note that thinking tokens are generated by the model and cannot be predicted beforehand. Select the appropriate reasoning model from the dropdown to see accurate input token counts.

Which AI model has the largest context window?

As of February 2026, Llama 4 Scout has the largest context window at 10M tokens, followed by Gemini 3 at 2M, GPT-4.1 and Grok-3 at 1M, and Claude 4.5 at 200K. Markdown Studio supports all these models and shows your context window usage percentage in real-time.

What's the best free markdown editor in 2026?

Markdown Studio is the top free markdown editor in 2026. It combines professional editing with AI prompt testing, token counting for 20+ models, Mermaid diagrams, LaTeX math, GitHub sync, and multi-format export — all with zero-knowledge privacy. No login or payment required.

Is there a free alternative to Typora?

Markdown Studio is a free alternative to Typora that adds AI-powered features. It includes live preview, code highlighting for 180+ languages, Mermaid diagrams, LaTeX math, and multi-format export. Plus AI token counting, smart variables, and GitHub sync — features Typora doesn't offer.

What is a privacy-focused prompt testing tool?

Markdown Studio is a privacy-focused prompt testing tool with zero-knowledge architecture. Your content never leaves your browser. Bring your own API keys (stored with AES-256-GCM encryption), test prompts across GPT-5, Claude 4.5, Gemini 3, and more — all 100% locally.

How do I get deterministic outputs from LLMs?

Use Markdown Studio's PromptOps locked mode. It enforces temperature=0, sets a fixed seed, pins the exact model version, and certifies snapshots with SHA-256 hashing. This ensures reproducible outputs across executions — critical for compliance, auditing, and regression testing.

Advanced LLM Parameters Guide

Master max tokens, top P, temperature, and deterministic execution for reliable AI behavior.

Overview

Advanced parameters give you fine-grained control over model behavior. Use them to optimize quality, cost, speed, and reproducibility.

Max Tokens

Max tokens controls the maximum length of model output. One token is roughly 4 characters or 0.75 words.

Use Case Guide

Range	Approx. Words	Best For	Description
256-512	~200-400	Brief Q&A	Single paragraph answers, quick summaries
512-1024	~400-750	Chatbots	Conversational responses, moderate detail
1024-2048	~750-1500	Balanced	Default range for most use cases
2048-4096	~1500-3000	Documentation	Detailed explanations and tutorials
4096-8192	~3000-6000	Analysis	Comprehensive long-form analysis
8192-16384	~6000-12000	Agent Tasks	Multi-step reasoning and complex workflows

Model-Specific Limits

Different models support different maximum output lengths. Click column headers to sort.

Model	Max Output	Max Input Context	Provider
GPT-5.2	128,000	400,000	OpenAI
Claude Opus 4.6	128,000	200,000	Anthropic
Gemini 2.5 Pro	65,536	1,000,000	Google
Claude Sonnet 4.5	64,000	200,000	Anthropic
GPT-4.1	32,768	1,000,000	OpenAI
GPT-4o	16,384	128,000	OpenAI
GPT-4o-mini	16,384	128,000	OpenAI
Claude 3.5 Sonnet	8,192	200,000	Anthropic
Gemini 2.0 Flash	8,192	1,000,000	Google
Gemini 1.5 Pro	8,192	2,000,000	Google
GPT-4 Turbo	4,096	128,000	OpenAI
Claude 3 Opus	4,096	200,000	Anthropic

Cost & Latency Impact

Higher Cost

Output tokens often cost more than input tokens. Large responses can significantly increase total API spend.

Longer Wait

More generated tokens increase latency, especially for streaming responses.

Better Completeness

Higher limits reduce the risk of responses being cut off mid-thought.

Start Low

Start around 1024-2048 and increase only when you observe truncation.

Top P (Nucleus Sampling)

Top P controls randomness by limiting candidate tokens to a cumulative probability mass.

How It Works

Model token probabilities:
Token A: 40%
Token B: 30%
Token C: 20%
Token D: 5%
Token E: 3%
Token F: 2%

With Top P = 0.9, the model samples from A-E.

Value Guide

Value	Behavior	Best For
0.1	Ultra-focused	Factual Q&A, data extraction
0.3-0.5	Deterministic	Technical docs, code generation
0.7-0.9	Balanced	Creative writing, brainstorming
0.95-1.0	Maximum variety	Story writing, ideation

Top P vs Temperature

Aspect	Temperature	Top P
Method	Scales all probabilities	Filters low-probability tokens
Dynamic	No	Yes (adapts to confidence)
Extreme values	Can produce nonsense	Safer boundaries
Use with	Set one OR the other	Set one OR the other
Recommended	Most users	Advanced users

Seed (Deterministic Execution)

A seed locks sampling randomness. With identical inputs and parameters, the same seed yields consistent outputs.

Why Deterministic Mode?

A/B Testing

Compare prompt variants without randomness as a confounder.

Certification

Approve outputs and keep behavior stable in production workflows.

Debugging

Reproduce errors reliably with identical input and seed.

Compliance

Support reproducible behavior requirements in regulated domains.

How It Works

Seed: 882194
Input: "Write a haiku about AI"

Output (every time):
Silicon neurons
Learning from endless data
Creativity blooms

PromptOps Workflow

Create prompt and enable deterministic mode
Execute prompt with locked parameters
Review output quality
Certify approved output (gold snapshot)
Validate future runs against snapshot
Detect drift when outputs differ

Use Case Matrix

Recommended parameter combinations for common scenarios:

Use Case	Temp	Max Tokens	Top P	Seed	Why
Chatbot	0.7	1024	1.0	No	Conversational and varied
Documentation	0.3	4096	0.7	No	Factual and detailed
Code Generation	0.2	4096	0.5	No	Deterministic and focused
Creative Writing	0.9	8192	0.95	No	Diverse long-form outputs
Data Extraction	0	512	0.1	No	Precise and brief
A/B Testing	0	2048	1.0	Yes	Isolate prompt changes
Certification	0	2048	1.0	Yes	Reproducible compliance
Agent Tasks	0.5	16384	0.8	Optional	Complex long-running tasks

Frequently Asked Questions

Why can't I set max tokens to 100,000?

Model and provider limits cap maximum output. For very long output, split work across multiple requests.

What's the difference between temperature and Top P?

Both control randomness: temperature rescales probabilities; Top P truncates the distribution by cumulative mass.

Do I always need a seed?

No. Use seed when reproducibility is required, such as testing, debugging, or compliance workflows.

Why can output change even with a seed?

Model version changes, parameter drift, or input differences can all change outputs despite the same seed.

What seed value should I use?

Any integer works. The value does not improve quality; it only controls reproducibility.

Can I use higher max tokens for agent tasks?

Yes, but monitor cost and latency. Test lower values first and increase only when needed.

Which providers support deterministic execution?

Support varies by model family and provider. Confirm current deterministic behavior in provider docs before production use.

Best Practices

Start Conservative

Begin with moderate defaults and tune based on measured outcomes.

Monitor Costs

Higher output limits raise costs. Apply tighter limits in development and testing.

Test Systematically

Change one parameter at a time to isolate its effect on output behavior.

Match to Use Case

Use low randomness for factual tasks and higher randomness for ideation tasks.

Check for Truncation

Increase max tokens when responses end abruptly or lose completeness.

Version Lock for Production

Pin exact model versions instead of aliases for stable behavior across releases.

Technical Deep Dive

How Nucleus Sampling Works

Top P removes the low-probability tail and samples from the remaining mass.

Probabilities: [0.5, 0.3, 0.1, 0.05, 0.04, 0.01]
Cumulative:    [0.5, 0.8, 0.9, 0.95, 0.99, 1.0]

Top P = 0.9
Considered: [0.5, 0.3, 0.1]
Excluded:  [0.05, 0.04, 0.01]

This adapts to confidence: focused when certainty is high, broader when uncertainty is high.

Deterministic Execution Under the Hood

function generate(prompt, seed) {
  rng = RandomGenerator(seed)
  for position in output {
    probabilities = model.predict_next_token(context)
    token = rng.sample(probabilities)
    output.append(token)
  }
}

With low temperature and fixed seed, outputs become highly repeatable for identical inputs.

PromptOps Checksum Validation

const config = {
  model: "gpt-4o-2024-08-06",
  temperature: 0,
  maxTokens: 2048,
  topP: 1.0,
  seed: 882194
};

const checksum = sha256(JSON.stringify(config));

Ready to Take Control?

Open Markdown Studio and tune advanced parameters with deterministic workflows.