Token

The basic unit of text that AI language models process. Tokens are not words — they are chunks of characters, typically 3–4 characters each. Every input and output is measured and billed in tokens. The maximum number of tokens a model can process at once is its context window.

What is a Token?

A token is the unit of text that language models work with internally. Before processing any input, a model's tokenizer splits the raw text into tokens — subword chunks that balance vocabulary size with processing efficiency. Common words are often a single token ("invoice" is 1 token). Less common words or long strings get split ("discontinuation" becomes 3–4 tokens). Spaces, punctuation, and numbers each consume tokens too. On average, 1 token is roughly 4 characters or 0.75 words in English.

Every operation — reading input, generating output — is counted in tokens. API pricing is denominated in tokens (input tokens and output tokens are often priced differently). Processing a 10-page contract costs more tokens than processing a one-paragraph email. Generating a 500-word summary costs more than a 50-word classification label.

Context Windows: The Token Limit That Matters

Every model has a context window — the maximum number of tokens it can hold in memory during a single interaction. This includes the system prompt, any retrieved documents (in RAG pipelines), the user input, and the model's own output. Common context windows range from 8,000 tokens (roughly 6,000 words) to 200,000 tokens (roughly 150,000 words) for frontier models.

Exceeding the context window means content gets cut off — the model cannot see or use what falls outside the limit. This matters practically:

A 50-page contract may exceed smaller models' context windows
Including too much retrieved context in a RAG pipeline leaves less room for reasoning
Long conversation histories accumulate tokens and can overflow the window

Tokens in Operations

Token awareness is a cost and reliability discipline. For operations teams running high-volume document workflows — hundreds of invoices, POs, or emails per day — token consumption directly affects operating costs. Keeping prompts lean, summarizing long documents before processing them, and choosing models with appropriately sized context windows are practical optimizations. Running a 200K-token context model on a 500-token invoice is wasteful. Running a 4K-token model on a multi-page contract will truncate critical data.

‹ TensorFlow

Training Data ›

Turn your manual decisions into intelligent operations

See how we capture your decision intelligence and put it to work inside the systems you already have. Start with one workflow. See results in days.

See pricing

Book a demo

Turn your manual decisions into intelligent operations

See how we capture your decision intelligence and put it to work inside the systems you already have. Start with one workflow. See results in days.

See pricing

Book a demo