Token
The basic unit of text that AI language models process. Tokens are not words — they are chunks of characters, typically 3–4 characters each. Every input and output is measured and billed in tokens. The maximum number of tokens a model can process at once is its context window.
What is a Token?
A token is the unit of text that language models work with internally. Before processing any input, a model's tokenizer splits the raw text into tokens — subword chunks that balance vocabulary size with processing efficiency. Common words are often a single token ("invoice" is 1 token). Less common words or long strings get split ("discontinuation" becomes 3–4 tokens). Spaces, punctuation, and numbers each consume tokens too. On average, 1 token is roughly 4 characters or 0.75 words in English.
Every operation — reading input, generating output — is counted in tokens. API pricing is denominated in tokens (input tokens and output tokens are often priced differently). Processing a 10-page contract costs more tokens than processing a one-paragraph email. Generating a 500-word summary costs more than a 50-word classification label.
Context Windows: The Token Limit That Matters
Every model has a context window — the maximum number of tokens it can hold in memory during a single interaction. This includes the system prompt, any retrieved documents (in RAG pipelines), the user input, and the model's own output. Common context windows range from 8,000 tokens (roughly 6,000 words) to 200,000 tokens (roughly 150,000 words) for frontier models.
Exceeding the context window means content gets cut off — the model cannot see or use what falls outside the limit. This matters practically:
A 50-page contract may exceed smaller models' context windows
Including too much retrieved context in a RAG pipeline leaves less room for reasoning
Long conversation histories accumulate tokens and can overflow the window
Tokens in Operations
Token awareness is a cost and reliability discipline. For operations teams running high-volume document workflows — hundreds of invoices, POs, or emails per day — token consumption directly affects operating costs. Keeping prompts lean, summarizing long documents before processing them, and choosing models with appropriately sized context windows are practical optimizations. Running a 200K-token context model on a 500-token invoice is wasteful. Running a 4K-token model on a multi-page contract will truncate critical data.