The maximum token length a model can attend to at once across input and output.
Context window dictates feature ceilings: how long a conversation can run, how rich a brief can be, and whether you need retrieval. Bigger windows allow richer UX but cost more and can hurt latency. PMs must set policies for truncation and decide when to upgrade models versus investing in compaction or retrieval.
Calculate worst-case token usage per flow (instructions + user text + history + citations + output). Add headroom for safety rails and tool schemas. Implement smart truncation (oldest-first for chat, low-signal sections for docs). In 2026, combine long-context models for drafting with cheaper short-context models for quick actions to balance cost and quality.
A research assistant started truncating safety instructions after 30 turns, causing two incident tickets. Adding conversation summarization every 8 turns and reserving 400 tokens for guardrails eliminated truncation while keeping latency under 1.6 s.