ZeroTool Workbench

AI Token Counter

Free browser-based token counter for GPT-5, GPT-4.1, GPT-4o, Claude Sonnet 4.6, Gemini 3, Llama 3.3, and DeepSeek V3. Live count and cost. No upload, no API key.

100% Client-Side Your data never leaves your browser Free · No Sign-Up
chars0 · words0

Token counts & cost

Prices are reference values reviewed in May 2026. Verify on each provider's pricing page.

How to Use

  1. Paste or type your prompt into the input box. Counts and cost cards update as you type.
  2. Each model card shows three values — token count, estimated cost for that text as input, and the per-million input/output rates.
  3. Cards labeled exact use the OpenAI BPE encoder. Cards labeled approx apply a calibrated factor over the same encoder; expect 5 to 10 percent drift from the provider’s own count.
  4. Open Token Visualization to see how an exact-count model splits the text. Each colored chip is one token. Hover a chip to read its token id.
  5. Click Copy stats to grab a one-line summary in plain text — useful for sharing context windows in code review or chat.

What Is a Token?

Large language models read text as a sequence of tokens, not characters or words. A token is a sub-word unit produced by a Byte Pair Encoding (BPE) algorithm trained on internet-scale text. Common English words like tokenizer typically map to one or two tokens, while rare strings, URLs, and CJK characters can each take three or more tokens. Pricing, rate limits, and context windows are all measured in tokens.

This tool uses OpenAI’s official cl100k_base and o200k_base tables, the same encoders used in production for GPT-3.5 through GPT-5. The o200k_base table doubled the vocabulary size relative to cl100k_base and tokenizes Chinese, Japanese, Korean, and Arabic with 20 to 40 percent fewer tokens. If you write multilingual prompts, the savings are real money.

Why Count Locally?

Public token counters that send prompts to a remote endpoint are convenient until they aren’t. System prompts and proprietary instructions routinely contain trade secrets, customer data, or unreleased product copy. A network round-trip leaks all of it. The browser-side approach in this tool keeps every byte on your machine and works in airplane mode after first load.

Cross-Provider Cost Comparison

Token counts are not directly comparable across providers because each tokenizer carves text differently. A 1,000-character paragraph might cost 250 OpenAI tokens but only 230 Claude tokens, or vice versa. Cost is the real comparable metric. The cost cards multiply token count by the published per-million input rate so you can decide between providers on the same yardstick — usually for chat-heavy applications, smaller multilingual models like DeepSeek V3 and Llama 3.3 win on raw rate, while GPT-5 and Claude Sonnet 4.6 win on quality per dollar for complex reasoning.

Privacy and Network Behavior

After the initial HTML and JavaScript load, this page makes no network requests for token data. The BPE vocabularies are bundled into the JavaScript chunk that ships with the page. Your input never appears in HTTP requests, analytics events, or error logs. You can verify this with the browser DevTools Network tab — once the page is ready, type freely and watch zero new requests appear.

FAQ

Is my prompt sent to any server?

No. Tokenization runs entirely in your browser. The cl100k_base and o200k_base BPE vocabularies ship with the page, and the encoder reads only your textarea. Your prompt never leaves the tab — there is no network call after the initial page load.

How accurate are non-OpenAI counts?

GPT-5, GPT-4.1, GPT-4o, GPT-4 Turbo, and GPT-3.5 counts are exact via OpenAI's official BPE tables. Claude, Gemini, Llama, and DeepSeek use a heuristic adjustment over the o200k_base baseline, calibrated against published samples — typically within 5 to 10 percent of the provider's own tokenizer. For billing-critical estimates use the provider's count_tokens API on a recent prompt.

Are the prices live?

Prices are static reference values reviewed in May 2026. Headline rates change without notice, especially for batch tiers, regional billing, and prompt caching. Always verify against the provider's pricing page before committing budget.

What size of input does this handle?

Up to roughly 500 KB of text per session. Beyond that the textarea and tokenizer can stutter on lower-end devices. For batch workloads use the official tokenizer locally — pip install tiktoken for OpenAI, anthropic.tokenizers for Claude.

Why doesn't my application's count match this number?

Production apps include system prompts, function or tool definitions, image and audio tokens, and chat formatting overhead that this counter intentionally ignores. The counter measures one body of plain text. To debug a billing line item, paste only the raw user content and compare against your application's user-only token field.