LLMs charge and constrain you by token, not character — and every model uses a different tokenizer. This tool counts tokens for OpenAI (GPT-4o, GPT-4, GPT-3.5, o1, o3) using the exact tiktoken BPE tables (o200k_base and cl100k_base), and gives calibrated estimates for Anthropic Claude, Google Gemini, and Meta Llama based on each family's published chars-per-token ratio.
Everything runs in your browser — your prompts, system messages, and source code never leave the device. The tokenizers are lazy-loaded so the page stays fast until you actually paste something.
LLM APIs price per token, not per request. A 50-page document is 30k+ tokens — the difference between $0.10 and $1.00 per call depending on model.
Hit the context limit and the model silently truncates the start of your input. The percentage column flags when you're close to the cliff.
When you build prompts dynamically (system + few-shot + user), counting tokens at each layer tells you how much room you have left for the actual user input.