Gateway Guardrails - Theo API Docs

Gateway Guardrails let you attach policy-driven safety, privacy, and reliability checks to any API key. Bound policies fire on every completion — including /v1/completions (streaming + non-streaming), /v1/images, /v1/code, /v1/documents, the playground, and embed widgets.

Strict opt-in contract

Nothing is live until you bind a policy to an API key. New keys, all existing keys, and every Clerk-session caller bypass guardrails entirely. There is no automatic org or user default — guardrails are an enforcement boundary, not a routing preference, so the gateway never enforces something you didn’t explicitly opt into. This means:

Existing API keys are unaffected until you explicitly bind a policy. Roll-out is risk-free.
No fallback chain. Clearing a binding (policy_id: null) drops the key back to no enforcement.
Per-key, 1:1. A key can be bound to at most one policy; the policy can be bound to any number of keys.

The five built-in protections

Each protection is a pure evaluator: identical inputs always produce identical outputs. They are vendor-neutral and never name an upstream provider in customer-facing strings.

pii_redactor — Hide personal info

Scrubs emails, phone numbers, US Social Security numbers, Luhn-valid credit cards, and driver-license-shaped IDs from the prompt before any model call.Allowed verdicts: redact (default), flag (logged only, prompt untouched).Phase: input only.

{ "guardrail_id": "pii_redactor", "phase": "input", "verdict": "redact" }

prompt_injection — Block jailbreak attempts

Catches the canonical OWASP-LLM01 patterns: “ignore previous instructions”, role takeovers, fake system messages, prompt exfiltration probes.Allowed verdicts: deny (default — surfaces a vendor-neutral 422 to the caller), flag.Phase: input only.

{ "guardrail_id": "prompt_injection", "phase": "input", "verdict": "deny" }

json_repair — Always return valid JSON

Detects malformed JSON in model output. On a repair verdict the runner makes a single repair attempt and falls back to the original buffer if it can’t produce valid JSON.Allowed verdicts: repair (default), flag.Phase: output only.Config: { "fenced": true } (default true) — when set, the evaluator first strips a fenced json block before parsing, mirroring the most common LLM output shape.

{ "guardrail_id": "json_repair", "phase": "output", "verdict": "repair", "config": { "fenced": true } }

max_length — Cap input or output length

Truncates the buffer to a configured cap and appends a small …[truncated] marker.Allowed verdicts: truncate (default), flag.Phase: input and/or output.Config (one of):

{ "maxChars": 16000 } — hard character cap.
{ "maxTokens": 4000, "charsPerToken": 4 } — token cap (default 4 chars/token).

{ "guardrail_id": "max_length", "phase": "output", "verdict": "truncate", "config": { "maxTokens": 4000 } }

profanity — Keep it clean

Detects common profanity in either direction. Defaults to flag (visibility only) so you can review activity before deciding to block.Allowed verdicts: flag (default), deny.Phase: input and/or output.

{ "guardrail_id": "profanity", "phase": "output", "verdict": "flag" }

The five verdicts

A rule’s verdict tells the runner what to do when the protection fires. The runner aggregates the worst verdict across all matched rules for the phase and applies it once:

flag — Pass-through. The match is recorded in the execution log; nothing else changes.
redact — Replace the matched text with [REDACTED:<kind>] before forwarding.
truncate — Trim to the configured limit and append …[truncated].
repair — Run a single repair pass on the buffer. Used by json_repair only.
deny — Block the request. On input the gateway returns a 422 invalid_request_error with code: "guardrail_violation" before any model call. On output the response content is replaced with the deny reason and artifacts are dropped.

Customer-facing messages never name an upstream provider — the SDK surfaces a stable guardrail_violation code so your application can branch on policy enforcement without knowing which detector fired.

Quick start — dashboard

Open Dashboard → Guardrails and pick a preset (or click New policy for a blank policy).
Add protections from the picker; each card lets you choose the phase and verdict.
Use the Test bench in the right rail to replay a fixture prompt + optional model output through the policy. You’ll see the verdict trail per rule.
In the Bindings panel below the editor, toggle the policy on each API key you want to enforce against.

That’s it. The policy is live for every bound key.

Quick start — SDK

import { Theo } from "@hitheo/sdk";

const theo = new Theo({ apiKey: process.env.THEO_API_KEY! });

// 1. (Optional) Start from a preset
const presets = await theo.guardrails.presets.list();
const enterprise = presets.find((p) => p.id === "enterprise-default")!;

// 2. Create a policy
const policy = await theo.guardrails.policies.create({
  name: "Production defaults",
  description: "PII redaction + jailbreak deny + JSON repair on output.",
  rules: enterprise.rules,
});

// 3. Bind it to an API key
await theo.keys.setGuardrailPolicy("<key-uuid>", policy.id);

// 4. Verify the binding
const bindings = await theo.guardrails.policies.bindings(policy.id);
console.log(`Policy is enforcing on ${bindings.count} key(s).`);

// 5. Tail the audit log
const recent = await theo.guardrails.executions.list({ policyId: policy.id, limit: 20 });

The management endpoints (/api/v1/guardrail-policies/*, /api/v1/keys/{id}/guardrails) require the billing API key scope — matching the routing-preferences convention. Mint a key with this scope explicitly if you plan to manage policies programmatically; the default completions scope is sufficient for normal traffic.

Error envelope

When an input-phase deny fires, the API returns:

{
  "error": {
    "type": "invalid_request_error",
    "code": "guardrail_violation",
    "message": "Request blocked by your active guardrail policy.",
    "request_id": "req_..."
  }
}

HTTP status is 422. For streaming completions the same envelope is emitted as an error SSE event:

event: error
data: { "error": { "type": "invalid_request_error", "code": "guardrail_violation", ... } }

The deny reason is the customer-facing copy the protection surfaced; the matched-pattern id (e.g. ignore_previous) is intentionally not included so attackers can’t probe which detector fired.

Streaming behavior

Output guardrails run after the full token stream has been collected on the server. This means:

Tokens stream live as the model produces them. The client sees the raw, pre-guard text in real time.
The done event carries the guarded buffer. Any client that re-renders the message body from the done event (the official SDK, the playground, the embed widget) ends up displaying the policy-compliant version.
On deny, an additional error SSE event is emitted before done. The done.content is replaced with the deny reason and any artifacts produced during the turn are dropped.

If your client renders tokens live and doesn’t re-read done.content, plan on switching to non-streaming mode (stream: false) when you need byte-exact policy enforcement on every visible character.

Team policies

scope: "team" policies are visible to every member of the owning organization and require the manageWebhooks permission to create/update (re-used as the generic team-config gate; there is no dedicated manageGuardrails bit today). Personal policies are only visible to their author. Team keys can only be bound to team policies in the same org — the binding endpoint enforces this with guardrail_policy_not_bindable when the scopes don’t match.

Audit & telemetry

Every evaluation writes an immutable row to guardrail_executions (90-day retention, enforced by the BullMQ retention worker). Each row captures:

policy_id, key_id, user_id — who/what triggered the evaluation.
guardrail_id, phase, verdict — the rule and outcome.
matched_pattern, redacted_count — bounded forensic context (no raw prompts).
latency_ms — per-rule overhead, surfaced in the dashboard’s activity feed.

Tail the log via theo.guardrails.executions.list({ keyId, policyId, since }) or GET /api/v1/guardrail-executions. The dashboard’s Recent activity panel renders the same rows in plain language.

What guardrails do not change

Model allowlists, routing rules, skills, brand soul, response style, and personality all keep working independently. Guardrails layer around the request/response path; they don’t override other key-level features.
Existing keys without a binding stay unchanged. You opt in per key.
Cost is unaffected. Builtins are pure evaluators (sub-millisecond); json_repair’s repair pass adds a single small call only when malformed JSON is detected.

​Strict opt-in contract

​The five built-in protections

​The five verdicts

​Quick start — dashboard

​Quick start — SDK

​Error envelope

​Streaming behavior

​Team policies

​Audit & telemetry

​What guardrails do not change

​API reference

Strict opt-in contract

The five built-in protections

The five verdicts

Quick start — dashboard

Quick start — SDK

Error envelope

Streaming behavior

Team policies

Audit & telemetry

What guardrails do not change

API reference