guide

Tokens vs Words in AI Chat (2026): What Actually Gets Counted

What tokens are, how they differ from words, and what AI companion platforms actually count toward your limits — so you know what you're really paying for.

Apr 30, 2026 · 9 min read

When people first encounter the word "token" in the context of AI, they usually assume it's a fancy word for "word." It's not. Tokens are the actual unit of text that language models process, and they don't map neatly onto words. A token can be a whole word, part of a word, a punctuation mark, or a single character. The same sentence written two slightly different ways can produce noticeably different token counts.

This isn't a technical detail you can safely ignore. Token counts determine how much conversation fits into an AI's context window, how much you pay if you're using an API, and how the model perceives what you've written. Understanding tokens makes a lot of AI behavior less mysterious.

What a token actually is

A token is a chunk of text that the model has been trained to recognize as a single unit. The full vocabulary a model works with is typically somewhere between 32,000 and 100,000 tokens, depending on the model. That sounds like a lot until you realize there are roughly 600,000 distinct words in English alone, plus all the words in every other language the model handles, plus all the technical terms, names, and unusual constructions. The vocabulary obviously can't store every word as its own token, so it stores common pieces.

The way tokens get carved up follows a process called Byte Pair Encoding, or BPE. The tokenizer learned during training which character sequences appear together frequently and bundled those into single tokens. Common words like "the" or "cat" usually get their own token. Less common words get broken into pieces. The word "tokenization" might split into "token" and "ization," because both pieces appear often enough across the training data to earn their own slots in the vocabulary.

The OpenAI Help Center has a clear technical overview that explains the model's perspective: it takes your text, breaks it into tokens, processes those tokens, and then converts the predicted output tokens back into the words you see on screen. The AI never actually sees words. It sees tokens.

The rules of thumb that mostly work

There's a widely-cited rule that one token equals about 0.75 words in English, or equivalently, that 1,000 words is roughly 1,300 tokens. This is approximate, not exact, but useful for quick estimates.

A few rules that hold up most of the time:

  • 100 tokens is about 75 English words
  • 1,000 tokens is about 750 words, or roughly 1.5 pages of typical text
  • 100,000 tokens is about 75,000 words, or roughly a 250-page novel
  • A short tweet of 280 characters is around 60-70 tokens
  • A typical email of 200 words is around 270 tokens
  • A blog post of 1,500 words is around 2,000 tokens

These ratios drift for different content types. Technical writing with lots of jargon tokenizes less efficiently because uncommon words break into more pieces. Code tokenizes weirdly because programming syntax doesn't match natural language patterns. Numbers and special characters often each take their own token regardless of length.

Non-English languages tokenize less efficiently than English. The OpenAI documentation gives the example of "Cómo estás" (Spanish for "How are you"), which contains 5 tokens for 10 characters. By contrast, English "How are you" is 3 tokens for 11 characters. The same conversation in Japanese, Hindi, or Arabic typically uses two to seven times as many tokens as the English version, which means non-English users hit context window limits proportionally faster.

The same word can be different tokens

This is the part that genuinely surprises people. The token assigned to a word depends on context. Capitalization, leading spaces, and surrounding punctuation all affect tokenization.

The OpenAI Help Center documents a clear example with the word "red." When "red" appears in the middle of a sentence with a leading space, it tokenizes as one specific ID. When "Red" appears with capitalization, it's a different token ID. When "Red" appears at the start of a sentence with no leading space, it's yet another token ID.

The model treats each of these as a slightly different unit. They share semantic meaning during processing, but they're not literally the same token from the tokenizer's perspective. This is why you'll occasionally see AI output where capitalization seems random or where the same word renders differently in different positions. The model is operating on tokens, not letters.

Hamburger is an instructive case. The word "hamburger" doesn't appear often enough in training data to earn a dedicated token, so it tokenizes as three separate pieces: "ham," "bur," and "ger." When the model generates text containing "hamburger," it's actually predicting three sequential tokens that happen to combine into the word we recognize.

Why this matters for actual use

The most direct consequence is the context window. Every AI has a maximum number of tokens it can process in a single response. Your message, the conversation history, the system prompt, the character card if you're using one, and the model's eventual reply all share that budget. When the budget fills up, something has to give, and what gets cut depends on the platform's choice of memory architecture.

Knowing tokens aren't words helps you estimate when you're approaching the limit. A long conversation that feels like maybe 100 messages is probably 5,000-15,000 tokens depending on message length, which is well past the working window for most consumer chat products. The "magic getting worse" feeling around the third week of regular AI companion use is partly a token budget hitting its ceiling.

The second consequence is cost. API-based AI services charge per token, with separate rates for input tokens (what you send) and output tokens (what the model generates). If you're building anything on top of an AI API, understanding token counts is the difference between predictable costs and surprise bills. The same conversation can cost dramatically more if it's full of unusual words that tokenize inefficiently.

The third consequence is response length. When you ask an AI to "write a 300-word response," you're asking it to count words it doesn't actually see directly. The model is predicting one token at a time and doesn't have a reliable internal counter for words. As a thoughtful breakdown of word-count limitations explains, asking for exact word counts is asking the model to do something its architecture doesn't support well. You'll usually get something close to the requested length, but not exact. If you genuinely need a precise word count, the right move is generating a draft and then trimming or expanding it externally.

The tokenizer is a separate component

The tokenizer that converts your text into tokens is technically separate from the language model itself. Different model families use different tokenizers. GPT-4 and GPT-3.5 use a tokenizer called cl100k_base. Newer GPT-4o and GPT-5 models use a more efficient tokenizer called o200k. Claude uses Anthropic's own proprietary tokenizer. Llama models use SentencePiece. Gemini uses its own variant.

This means the same text can produce different token counts on different platforms. A 1,000-word article might be 1,250 tokens on GPT-4o, 1,300 on Claude, and 1,400 on Llama. These differences are usually small but they matter when you're comparing pricing across providers or trying to fit the same content into different models' context windows.

OpenAI runs a free tokenizer tool where you can paste any text and see exactly how it gets broken down for their models. Each token gets its own colored highlight, so you can see the boundaries the model is working with. It's a useful exercise to paste in a paragraph of your own writing and look at where the splits land. Most people are surprised at what gets bundled and what doesn't.

Practical patterns once you understand tokens

A few patterns become obvious once you know how tokens work.

Common, simple words are token-efficient. Writing "I want to ask about the new policy" tokenizes more efficiently than "I'm desiring inquiry regarding the recently-promulgated policy directive." Same content, vastly different token counts. For situations where you're paying per token or fighting context limits, simpler phrasing literally costs less.

Punctuation is its own token. Heavy punctuation, lots of em dashes, complex parenthetical asides, all add tokens. This is rarely a major factor but it's noticeable in highly stylized writing.

Numbers and dates can be surprisingly heavy. "April 30, 2026" might tokenize as four or five tokens depending on the tokenizer. "Q1 fiscal year 2026 results" might tokenize as eight or nine. If you're writing prompts that include lots of structured data, the token cost adds up faster than you'd expect.

Code is the heaviest content per character. Programming languages have characters and patterns that don't match the natural-language patterns the tokenizer was optimized for. A 500-character code block can easily be 200-300 tokens, where the same character count of natural language would be 80-120 tokens.

What's coming for tokens

Tokenization is an active area of research. Newer tokenizers are more efficient than older ones, fitting more content per token. The o200k tokenizer in newer OpenAI models is roughly 10-20% more efficient than the cl100k_base tokenizer in earlier GPT-4 versions, which means longer effective context windows on the same numerical token count.

Some research is exploring tokenizer-free approaches that operate at the byte level instead, which would eliminate the tokenization step entirely. These approaches haven't reached mainstream production models yet, but the direction is clear. Over time, tokens may become less visible to users, with the underlying mechanism still doing similar work but presenting cleaner abstractions.

For now, the tokens-not-words distinction remains real and worth understanding. It explains a lot of AI behavior that otherwise seems arbitrary, and it gives you the foundation to use these tools more effectively.

Frequently asked

How many tokens are in a typical sentence?

A short sentence is usually 10-20 tokens. A medium sentence is 20-40. A long, complex sentence with multiple clauses might be 40-80. The relationship to word count is rough but useful: multiply word count by about 1.33 for typical English prose.

Why does my AI response cut off mid-sentence sometimes?

You probably hit a maximum output token limit. Most AI APIs have a configurable cap on how many tokens the model can generate in a single response, and when the cap is reached, generation stops regardless of whether the sentence is complete. Increasing the limit (where supported) or asking the model to continue picks up where it left off.

Do tokens cost money?

In API contexts, yes. Most AI providers charge per token, with separate rates for input and output. In consumer products like ChatGPT or Claude.ai, the cost is bundled into the subscription, so you don't see individual token charges, but the tokens are still being counted in the background.

Can I count tokens before sending?

Yes, using tokenizer tools. OpenAI's official tokenizer is at platform.openai.com/tokenizer. Several third-party tools count tokens for Claude, Llama, and other models. Knowing your token count before sending is useful for fitting content into context limits.

Why does the same word sometimes use more tokens than expected?

Capitalization, leading spaces, and the word's frequency in training data all affect how it tokenizes. Common words usually get one token. Rare words break into pieces. Capitalized versions of common words sometimes tokenize differently than lowercase versions.

Are tokens the same thing as characters?

No. Characters are individual letters or symbols. Tokens are units the model processes, which can be one character, several characters, a whole word, or even more. The relationship between characters and tokens is roughly 4 characters per token in English, but it varies.

Is there a way to write text that's more "token-efficient"?

Yes. Use common words rather than rare ones. Avoid heavy punctuation when natural alternatives exist. Use plain prose rather than stylized formatting when context budget matters. The tradeoff is that token-efficient writing can sometimes feel less precise, so it's a balance, not a rule.