Introduction
Hub guardrails provide real-time safety and quality checks for LLM requests and responses at the gateway level. Hub guardrails can run centrally before requests reach your LLM providers (pre-call) and after responses are received from LLMs but before they return to users (post-call).
Key Benefits:
- No Code Changes Required - Add safety checks without modifying application code
- Centralized Control - Manage security policies for all LLM traffic in one place
- Provider-Agnostic - Works with any LLM provider (OpenAI, Anthropic, Azure, etc.)
- Real-Time Protection - Blocks malicious requests and filters harmful responses
- Flexible Policies - Different guardrail configurations per pipeline
How Guardrails Work
Guardrails execute at two critical points in the request lifecycle:
User Request → Pre-call Guards → LLM Provider → Post-call Guards → User Response
(concurrent) (concurrent)
↓ ↓
Block (403) or Warn Block (403) or Warn
Execution Flow
- User sends a request to Hub
- Pre-call guards execute concurrently on the user’s prompt
- If any blocking guard fails → return HTTP 403
- If warning guards fail → add warning headers, continue
- Request forwarded to LLM (if not blocked)
- Post-call guards execute concurrently on the LLM’s response
- If any blocking guard fails → return HTTP 403
- If warning guards fail → add warning headers, continue
- Response returned to user (if not blocked)
Pre-call vs Post-call Guards
Pre-call guards run on the prompt messages before it reaches the LLM. Use these for security checks, input validation, and preventing malicious prompts.
Post-call guards run on the LLM’s completion after the response is generated. Use these for output safety, content moderation, and preventing data leaks.
Many guards work well in both modes for comprehensive protection - for example, PII detection can prevent sensitive data in both user prompts and LLM responses.
Supported Request Types
Guardrails work across all three LLM endpoint types with appropriate logic for each:
| Request Type | Pre-call Guards | Post-call Guards | Streaming Support |
|---|
/chat/completions | ✅ | ✅ | ✅ Skipped |
/completions (legacy) | ✅ | ✅ | ✅ Skipped |
/embeddings | ✅ | ❌ N/A | ❌ N/A |
- Chat and legacy completions support both pre-call and post-call guards. When streaming is enabled, post-call guards are skipped since the response is delivered incrementally.
- Embeddings only support pre-call guards, as there is no text completion to evaluate in the response.
Core Concepts
Guards
A guard is a configured instance of an evaluator. Each guard defines:
- What to evaluate (evaluator type)
- When to evaluate (pre_call or post_call)
- How to respond to failures (block or warn)
- Configuration parameters (evaluator-specific settings)
Example guard configuration:
guards:
- name: pii-check
provider: traceloop
evaluator_slug: pii-detector
mode: pre_call
on_failure: block
required: true
Evaluators
Evaluators are the detection algorithms that analyze text. Traceloop Hub includes 12 built-in evaluators across three categories:
Safety Evaluators (6):
pii-detector - Detects personally identifiable information
secrets-detector - Identifies exposed secrets and API keys
prompt-injection - Detects prompt injection attacks
profanity-detector - Detects profane language
sexism-detector - Identifies sexist content
toxicity-detector - Detects toxic/harmful content
Validation Evaluators (3):
regex-validator - Custom pattern matching
json-validator - JSON structure validation
sql-validator - SQL syntax validation
Quality Evaluators (3):
tone-detection - Analyzes communication tone
prompt-perplexity - Measures prompt quality
uncertainty-detector - Detects uncertain language
Execution Modes
Guards can run in two modes:
pre_call Mode:
- Executes on user input before the LLM call
- Best for: security checks, input validation, attack prevention
- Examples: prompt injection detection, input PII filtering
post_call Mode:
- Executes on LLM output after the LLM responds
- Best for: output safety, content moderation, quality checks
- Examples: response PII filtering, secrets detection, tone validation
Failure Handling
When a guard evaluation fails, the system responds based on the on_failure setting:
block Mode:
- Returns HTTP 403 Forbidden to the user
- Includes details about which guard failed
- Prevents the request/response from proceeding
warn Mode:
- Adds an
x-traceloop-guardrail-warning header to the response
- Allows the request/response to continue
Required Flag (Fail-Closed vs Fail-Open)
The required flag determines behavior when the evaluator service is unavailable, times out, or errors.
Default: false
required: true (Fail-Closed):
- If evaluator is unavailable → treat as failure
- Use for security-critical guards
- Ensures zero gaps in protection
- Example: PII detection in healthcare apps
required: false (Fail-Open):
- If evaluator is unavailable → continue anyway
- Use for quality checks and non-critical guards
- Prioritizes availability over enforcement
- Example: Tone detection in internal tools
Providers
Providers are the services that execute evaluations. Currently, Hub supports the Traceloop provider, which offers all 12 evaluators through the Traceloop API.
Provider configuration example:
guardrails:
providers:
- name: traceloop
api_base: https://api.traceloop.com
api_key: ${TRACELOOP_API_KEY}
Quick Start Example
Here’s a minimal configuration that adds PII detection and prompt injection protection:
guardrails:
providers:
- name: traceloop
api_base: https://api.traceloop.com
api_key: ${TRACELOOP_API_KEY}
guards:
- name: pii-check
provider: traceloop
evaluator_slug: pii-detector
mode: pre_call
on_failure: block
required: true
- name: injection-check
provider: traceloop
evaluator_slug: prompt-injection
mode: pre_call
on_failure: block
required: true
pipelines:
- name: default
type: chat
guards:
- pii-check
- injection-check
plugins:
- model-router:
models: [gpt-4]
This configuration:
- Checks all user prompts for PII (blocks if detected)
- Checks all user prompts for injection attacks (blocks if detected)
- Runs both guards concurrently for minimal latency
- Fails closed (blocks if evaluator unavailable)
Observability
Every guard evaluation creates an OpenTelemetry span with attributes:
gen_ai.guardrail.name - Guard name
gen_ai.guardrail.status - PASSED, FAILED, or ERROR
gen_ai.guardrail.duration - Execution time in milliseconds
gen_ai.guardrail.error.type - Error category (if failed)
gen_ai.guardrail.input - Guard input text
The spans will be visible in the Traceloop Trace table. Use them to monitor guardrail performance, track failures, and optimize configurations.
Next Steps