CraftUp · 2026

AI Product Management Glossary (2026)

Built by PMs for PMs. Use this glossary to align your team on decision-critical concepts—context, retrieval, agents, evals, and safety—so you can ship reliable AI features faster.

Start here

5 cornerstone concepts

Retrieval and Knowledge

11 terms
Retrieval and Knowledge
Agent memory

The mechanisms an agent uses to remember and reuse past interactions or facts across turns and sessions.

Retrieval and Knowledge
Chunking strategy

How you split documents or histories into pieces for indexing so retrieval balances relevance, completeness, and speed.

Retrieval and Knowledge
Citations and attribution

Showing which sources support an answer, with links or identifiers users can verify.

Retrieval and Knowledge
Context compaction

Reducing and reshaping context (summaries, salience scoring, deduplication) so key facts fit within token and latency budgets.

Retrieval and Knowledge
Data provenance

Tracking the origin, transformations, and permissions of data used for training, retrieval, or responses.

Retrieval and Knowledge
Embeddings

Vector representations of text or data that capture semantic meaning, enabling similarity search, clustering, and ranking.

Retrieval and Knowledge
Grounded answers

Responses that are explicitly supported by retrieved or verifiable sources, reducing hallucination risk.

Retrieval and Knowledge
Knowledge freshness

Keeping the information an AI feature relies on up to date, and detecting when stale data harms quality.

Retrieval and Knowledge
Reranking

A second-pass model or heuristic that orders retrieved items by relevance before feeding them to the LLM.

Retrieval and Knowledge
Retrieval-Augmented Generation (RAG)

Pairing LLM reasoning with external retrieval so responses cite up-to-date, relevant sources instead of relying on model memory.

Retrieval and Knowledge
Vector database

A storage and query engine optimized for vector similarity search, often combined with metadata filtering and hybrid search.

Agents and Tooling

12 terms
Agents and Tooling
Agent orchestration

Coordinating how agents, tools, and models are invoked, sequenced, and supervised within a product.

Agents and Tooling
Agentic workflow

A product flow where an agent chains reasoning, tool use, and checkpoints to achieve a user goal with minimal hand-holding.

Agents and Tooling
AI agent

A system where an LLM plans and executes actions toward a goal using tools, memory, and feedback loops.

Agents and Tooling
Function calling

An API pattern where the LLM returns a structured call to a specified function, often validated and executed by your code.

Agents and Tooling
Long-running agent harness

Infrastructure to run, monitor, and resume agents that operate over minutes to hours with checkpoints and persistence.

Agents and Tooling
Model Context Protocol (MCP)

A protocol for connecting models to external tools and data sources in a standardized, secure way.

Agents and Tooling
Multi-agent system

A setup where multiple specialized agents collaborate or compete to solve a task, often with coordination rules.

Agents and Tooling
Planner-executor pattern

Splitting an agent into a planning component that outlines steps and an executor that performs them, often with feedback.

Agents and Tooling
Reflection loop

A pattern where the model critiques or scores its own output (or an agent’s step) before finalizing or retrying.

Agents and Tooling
Tool calling

Allowing a model to invoke predefined functions or APIs with structured arguments during its reasoning loop.

Agents and Tooling
Tool reliability

How often model-invoked tools succeed, how they fail, and how gracefully the system recovers.

Agents and Tooling
Tool schema design

Crafting clear input/output definitions for tools exposed to the model to ensure safe, correct, and efficient calls.

A–Z index

A
Agent memory

The mechanisms an agent uses to remember and reuse past interactions or facts across turns and sessions.

A
Agent orchestration

Coordinating how agents, tools, and models are invoked, sequenced, and supervised within a product.

A
Agentic workflow

A product flow where an agent chains reasoning, tool use, and checkpoints to achieve a user goal with minimal hand-holding.

A
AI agent

A system where an LLM plans and executes actions toward a goal using tools, memory, and feedback loops.

C
Chunking strategy

How you split documents or histories into pieces for indexing so retrieval balances relevance, completeness, and speed.

C
Citations and attribution

Showing which sources support an answer, with links or identifiers users can verify.

C
Content filtering

Screening inputs and outputs for toxicity, abuse, violence, or other policy-violating content.

C
Context compaction

Reducing and reshaping context (summaries, salience scoring, deduplication) so key facts fit within token and latency budgets.

C
Context engineering

Deliberately shaping what the model sees—ordering, framing, and scoping inputs—to drive reliable, on-brand responses.

C
Context window

The maximum token length a model can attend to at once across input and output.

C
Cost per task

Total variable cost (tokens, tool calls, infra) to complete a user task with your AI feature.

D
Data leakage

Sensitive information being exposed to unauthorized users or external systems through model inputs, outputs, or logs.

D
Data provenance

Tracking the origin, transformations, and permissions of data used for training, retrieval, or responses.

E
Embeddings

Vector representations of text or data that capture semantic meaning, enabling similarity search, clustering, and ranking.

E
Evaluation harness

A repeatable pipeline that scores model or agent outputs against test cases and business metrics before and after changes.

F
Few-shot examples

Concrete input-output pairs included in the prompt to teach the model the desired style, structure, or reasoning without training.

F
Function calling

An API pattern where the LLM returns a structured call to a specified function, often validated and executed by your code.

G
Golden set

A curated collection of test cases with trusted answers used to judge model quality over time.

G
Grounded answers

Responses that are explicitly supported by retrieved or verifiable sources, reducing hallucination risk.

G
Guardrails

Policies and technical controls that constrain what an AI can say or do, preventing harmful or out-of-scope behavior.

H
Hallucination rate

How often a model produces unsupported or incorrect facts relative to total responses.

J
Jailbreak

A crafted input designed to bypass safety constraints and make the model produce disallowed content or actions.

K
Knowledge freshness

Keeping the information an AI feature relies on up to date, and detecting when stale data harms quality.

L
Latency budget

The maximum response time you can spend across model calls, tools, and orchestration while meeting UX and business goals.

L
LLM-as-a-judge

Using a model to score another model’s outputs against criteria, often faster and cheaper than human labeling.

L
Long-running agent harness

Infrastructure to run, monitor, and resume agents that operate over minutes to hours with checkpoints and persistence.

M
Model Context Protocol (MCP)

A protocol for connecting models to external tools and data sources in a standardized, secure way.

M
Multi-agent system

A setup where multiple specialized agents collaborate or compete to solve a task, often with coordination rules.

O
Offline evals

Quality tests run on recorded or synthetic data without live users, giving fast, safe feedback on changes.

O
Online evals

Live experiments that measure model changes with real user traffic, often via A/B tests or shadow deployments.

P
PII redaction

Detecting and removing personally identifiable information from inputs, outputs, or stored data to prevent exposure.

P
Planner-executor pattern

Splitting an agent into a planning component that outlines steps and an executor that performs them, often with feedback.

P
Prompt engineering

Designing and testing instructions, examples, and constraints so an LLM produces outputs that meet product requirements.

P
Prompt injection

A user or document attempt to override system instructions, causing the model to act outside intended bounds.

P
Prompt library

A governed collection of reusable, versioned prompts and context blocks that teams can consume safely.

P
Prompt template

A parameterized prompt pattern that inserts dynamic data while preserving structure, tone, and constraints.

R
Red teaming

Systematic attempts to break or exploit an AI system to uncover safety and security weaknesses before attackers do.

R
Reflection loop

A pattern where the model critiques or scores its own output (or an agent’s step) before finalizing or retrying.

R
Regression testing (LLM)

Running automated checks to ensure a change doesn’t reintroduce past bugs or quality drops in model behavior.

R
Reranking

A second-pass model or heuristic that orders retrieved items by relevance before feeding them to the LLM.

R
Retrieval-Augmented Generation (RAG)

Pairing LLM reasoning with external retrieval so responses cite up-to-date, relevant sources instead of relying on model memory.

S
Structured outputs

Requiring the model to return JSON or another strict schema so downstream systems can parse results reliably.

S
System prompt

The always-on instruction block that sets persona, guardrails, and priorities for every model call in your product.

T
Task success rate

The percentage of user or agent tasks completed correctly without human rework or retries.

T
Token budget

The maximum tokens you allocate per request across prompt, tools, and output to control latency and cost.

T
Tool calling

Allowing a model to invoke predefined functions or APIs with structured arguments during its reasoning loop.

T
Tool instructions

Explicit guidance given to a model about when and how to call tools or APIs, including constraints and safety rules.

T
Tool reliability

How often model-invoked tools succeed, how they fail, and how gracefully the system recovers.

T
Tool schema design

Crafting clear input/output definitions for tools exposed to the model to ensure safe, correct, and efficient calls.

V
Vector database

A storage and query engine optimized for vector similarity search, often combined with metadata filtering and hybrid search.

Learn it in CraftUp

Go deeper with our courses, resources, and blog. Start learning free in the app—no fluff, just practical AI product skills.