CraftUp · 2026

AI Product Management Glossary (2026)

Built by PMs for PMs. Use this glossary to align your team on decision-critical concepts—context, retrieval, agents, evals, and safety—so you can ship reliable AI features faster.

Search the glossary

Showing 50 of 50 terms

Retrieval and Knowledge

The mechanisms an agent uses to remember and reuse past interactions or facts across turns and sessions.

Agents and Tooling

Agent orchestration

Coordinating how agents, tools, and models are invoked, sequenced, and supervised within a product.

Agents and Tooling

Agentic workflow

A product flow where an agent chains reasoning, tool use, and checkpoints to achieve a user goal with minimal hand-holding.

Agents and Tooling

A system where an LLM plans and executes actions toward a goal using tools, memory, and feedback loops.

Retrieval and Knowledge

Chunking strategy

How you split documents or histories into pieces for indexing so retrieval balances relevance, completeness, and speed.

Retrieval and Knowledge

Citations and attribution

Showing which sources support an answer, with links or identifiers users can verify.

Safety and Security

Content filtering

Screening inputs and outputs for toxicity, abuse, violence, or other policy-violating content.

Retrieval and Knowledge

Context compaction

Reducing and reshaping context (summaries, salience scoring, deduplication) so key facts fit within token and latency budgets.

Keep typing to narrow results. All terms are listed below in A–Z.

Start here

5 cornerstone concepts

Context and Prompting

Context engineering

Deliberately shaping what the model sees—ordering, framing, and scoping inputs—to drive reliable, on-brand responses.

Retrieval and Knowledge

Retrieval-Augmented Generation (RAG)

Pairing LLM reasoning with external retrieval so responses cite up-to-date, relevant sources instead of relying on model memory.

Agents and Tooling

A system where an LLM plans and executes actions toward a goal using tools, memory, and feedback loops.

Evaluation and Quality

Evaluation harness

A repeatable pipeline that scores model or agent outputs against test cases and business metrics before and after changes.

Safety and Security

Policies and technical controls that constrain what an AI can say or do, preventing harmful or out-of-scope behavior.

Context and Prompting

10 terms

Context and Prompting

Context engineering

Deliberately shaping what the model sees—ordering, framing, and scoping inputs—to drive reliable, on-brand responses.

Context and Prompting

The maximum token length a model can attend to at once across input and output.

Context and Prompting

Few-shot examples

Concrete input-output pairs included in the prompt to teach the model the desired style, structure, or reasoning without training.

Context and Prompting

Prompt engineering

Designing and testing instructions, examples, and constraints so an LLM produces outputs that meet product requirements.

Context and Prompting

A governed collection of reusable, versioned prompts and context blocks that teams can consume safely.

Context and Prompting

Prompt template

A parameterized prompt pattern that inserts dynamic data while preserving structure, tone, and constraints.

Context and Prompting

Structured outputs

Requiring the model to return JSON or another strict schema so downstream systems can parse results reliably.

Context and Prompting

The always-on instruction block that sets persona, guardrails, and priorities for every model call in your product.

Context and Prompting

The maximum tokens you allocate per request across prompt, tools, and output to control latency and cost.

Context and Prompting

Tool instructions

Explicit guidance given to a model about when and how to call tools or APIs, including constraints and safety rules.

Retrieval and Knowledge

11 terms

Retrieval and Knowledge

The mechanisms an agent uses to remember and reuse past interactions or facts across turns and sessions.

Retrieval and Knowledge

Chunking strategy

How you split documents or histories into pieces for indexing so retrieval balances relevance, completeness, and speed.

Retrieval and Knowledge

Citations and attribution

Showing which sources support an answer, with links or identifiers users can verify.

Retrieval and Knowledge

Context compaction

Reducing and reshaping context (summaries, salience scoring, deduplication) so key facts fit within token and latency budgets.

Retrieval and Knowledge

Data provenance

Tracking the origin, transformations, and permissions of data used for training, retrieval, or responses.

Retrieval and Knowledge

Vector representations of text or data that capture semantic meaning, enabling similarity search, clustering, and ranking.

Retrieval and Knowledge

Grounded answers

Responses that are explicitly supported by retrieved or verifiable sources, reducing hallucination risk.

Retrieval and Knowledge

Knowledge freshness

Keeping the information an AI feature relies on up to date, and detecting when stale data harms quality.

Retrieval and Knowledge

A second-pass model or heuristic that orders retrieved items by relevance before feeding them to the LLM.

Retrieval and Knowledge

Retrieval-Augmented Generation (RAG)

Pairing LLM reasoning with external retrieval so responses cite up-to-date, relevant sources instead of relying on model memory.

Retrieval and Knowledge

Vector database

A storage and query engine optimized for vector similarity search, often combined with metadata filtering and hybrid search.

Agents and Tooling

12 terms

Agents and Tooling

Agent orchestration

Coordinating how agents, tools, and models are invoked, sequenced, and supervised within a product.

Agents and Tooling

Agentic workflow

A product flow where an agent chains reasoning, tool use, and checkpoints to achieve a user goal with minimal hand-holding.

Agents and Tooling

A system where an LLM plans and executes actions toward a goal using tools, memory, and feedback loops.

Agents and Tooling

Function calling

An API pattern where the LLM returns a structured call to a specified function, often validated and executed by your code.

Agents and Tooling

Long-running agent harness

Infrastructure to run, monitor, and resume agents that operate over minutes to hours with checkpoints and persistence.

Agents and Tooling

Model Context Protocol (MCP)

A protocol for connecting models to external tools and data sources in a standardized, secure way.

Agents and Tooling

Multi-agent system

A setup where multiple specialized agents collaborate or compete to solve a task, often with coordination rules.

Agents and Tooling

Planner-executor pattern

Splitting an agent into a planning component that outlines steps and an executor that performs them, often with feedback.

Agents and Tooling

Reflection loop

A pattern where the model critiques or scores its own output (or an agent’s step) before finalizing or retrying.

Agents and Tooling

Allowing a model to invoke predefined functions or APIs with structured arguments during its reasoning loop.

Agents and Tooling

Tool reliability

How often model-invoked tools succeed, how they fail, and how gracefully the system recovers.

Agents and Tooling

Tool schema design

Crafting clear input/output definitions for tools exposed to the model to ensure safe, correct, and efficient calls.

Evaluation and Quality

10 terms

Evaluation and Quality

Total variable cost (tokens, tool calls, infra) to complete a user task with your AI feature.

Evaluation and Quality

Evaluation harness

A repeatable pipeline that scores model or agent outputs against test cases and business metrics before and after changes.

Evaluation and Quality

A curated collection of test cases with trusted answers used to judge model quality over time.

Evaluation and Quality

Hallucination rate

How often a model produces unsupported or incorrect facts relative to total responses.

Evaluation and Quality

The maximum response time you can spend across model calls, tools, and orchestration while meeting UX and business goals.

Evaluation and Quality

Using a model to score another model’s outputs against criteria, often faster and cheaper than human labeling.

Evaluation and Quality

Quality tests run on recorded or synthetic data without live users, giving fast, safe feedback on changes.

Evaluation and Quality

Live experiments that measure model changes with real user traffic, often via A/B tests or shadow deployments.

Evaluation and Quality

Regression testing (LLM)

Running automated checks to ensure a change doesn’t reintroduce past bugs or quality drops in model behavior.

Evaluation and Quality

Task success rate

The percentage of user or agent tasks completed correctly without human rework or retries.

Safety and Security

7 terms

Safety and Security

Content filtering

Screening inputs and outputs for toxicity, abuse, violence, or other policy-violating content.

Safety and Security

Sensitive information being exposed to unauthorized users or external systems through model inputs, outputs, or logs.

Safety and Security

Policies and technical controls that constrain what an AI can say or do, preventing harmful or out-of-scope behavior.

Safety and Security

A crafted input designed to bypass safety constraints and make the model produce disallowed content or actions.

Safety and Security

Detecting and removing personally identifiable information from inputs, outputs, or stored data to prevent exposure.

Safety and Security

Prompt injection

A user or document attempt to override system instructions, causing the model to act outside intended bounds.

Safety and Security

Systematic attempts to break or exploit an AI system to uncover safety and security weaknesses before attackers do.

A–Z index

The mechanisms an agent uses to remember and reuse past interactions or facts across turns and sessions.

Agent orchestration

Coordinating how agents, tools, and models are invoked, sequenced, and supervised within a product.

Agentic workflow

A product flow where an agent chains reasoning, tool use, and checkpoints to achieve a user goal with minimal hand-holding.

A system where an LLM plans and executes actions toward a goal using tools, memory, and feedback loops.

Chunking strategy

How you split documents or histories into pieces for indexing so retrieval balances relevance, completeness, and speed.

Citations and attribution

Showing which sources support an answer, with links or identifiers users can verify.

Content filtering

Screening inputs and outputs for toxicity, abuse, violence, or other policy-violating content.

Context compaction

Reducing and reshaping context (summaries, salience scoring, deduplication) so key facts fit within token and latency budgets.

Context engineering

Deliberately shaping what the model sees—ordering, framing, and scoping inputs—to drive reliable, on-brand responses.

The maximum token length a model can attend to at once across input and output.

Total variable cost (tokens, tool calls, infra) to complete a user task with your AI feature.

Sensitive information being exposed to unauthorized users or external systems through model inputs, outputs, or logs.

Data provenance

Tracking the origin, transformations, and permissions of data used for training, retrieval, or responses.

Vector representations of text or data that capture semantic meaning, enabling similarity search, clustering, and ranking.

Evaluation harness

A repeatable pipeline that scores model or agent outputs against test cases and business metrics before and after changes.

Few-shot examples

Concrete input-output pairs included in the prompt to teach the model the desired style, structure, or reasoning without training.

Function calling

An API pattern where the LLM returns a structured call to a specified function, often validated and executed by your code.

A curated collection of test cases with trusted answers used to judge model quality over time.

Grounded answers

Responses that are explicitly supported by retrieved or verifiable sources, reducing hallucination risk.

Policies and technical controls that constrain what an AI can say or do, preventing harmful or out-of-scope behavior.

Hallucination rate

How often a model produces unsupported or incorrect facts relative to total responses.

A crafted input designed to bypass safety constraints and make the model produce disallowed content or actions.

Knowledge freshness

Keeping the information an AI feature relies on up to date, and detecting when stale data harms quality.

The maximum response time you can spend across model calls, tools, and orchestration while meeting UX and business goals.

Using a model to score another model’s outputs against criteria, often faster and cheaper than human labeling.

Long-running agent harness

Infrastructure to run, monitor, and resume agents that operate over minutes to hours with checkpoints and persistence.

Model Context Protocol (MCP)

A protocol for connecting models to external tools and data sources in a standardized, secure way.

Multi-agent system

A setup where multiple specialized agents collaborate or compete to solve a task, often with coordination rules.

Quality tests run on recorded or synthetic data without live users, giving fast, safe feedback on changes.

Live experiments that measure model changes with real user traffic, often via A/B tests or shadow deployments.

Detecting and removing personally identifiable information from inputs, outputs, or stored data to prevent exposure.

Planner-executor pattern

Splitting an agent into a planning component that outlines steps and an executor that performs them, often with feedback.

Prompt engineering

Designing and testing instructions, examples, and constraints so an LLM produces outputs that meet product requirements.

Prompt injection

A user or document attempt to override system instructions, causing the model to act outside intended bounds.

A governed collection of reusable, versioned prompts and context blocks that teams can consume safely.

Prompt template

A parameterized prompt pattern that inserts dynamic data while preserving structure, tone, and constraints.

Systematic attempts to break or exploit an AI system to uncover safety and security weaknesses before attackers do.

Reflection loop

A pattern where the model critiques or scores its own output (or an agent’s step) before finalizing or retrying.

Regression testing (LLM)

Running automated checks to ensure a change doesn’t reintroduce past bugs or quality drops in model behavior.

A second-pass model or heuristic that orders retrieved items by relevance before feeding them to the LLM.

Retrieval-Augmented Generation (RAG)

Pairing LLM reasoning with external retrieval so responses cite up-to-date, relevant sources instead of relying on model memory.

Structured outputs

Requiring the model to return JSON or another strict schema so downstream systems can parse results reliably.

The always-on instruction block that sets persona, guardrails, and priorities for every model call in your product.

Task success rate

The percentage of user or agent tasks completed correctly without human rework or retries.

The maximum tokens you allocate per request across prompt, tools, and output to control latency and cost.

Allowing a model to invoke predefined functions or APIs with structured arguments during its reasoning loop.

Tool instructions

Explicit guidance given to a model about when and how to call tools or APIs, including constraints and safety rules.

Tool reliability

How often model-invoked tools succeed, how they fail, and how gracefully the system recovers.

Tool schema design

Crafting clear input/output definitions for tools exposed to the model to ensure safe, correct, and efficient calls.

Vector database

A storage and query engine optimized for vector similarity search, often combined with metadata filtering and hybrid search.

Learn it in CraftUp

Go deeper with our courses, resources, and blog. Start learning free in the app—no fluff, just practical AI product skills.

Courses Resources Library Blog Download the app