Gateway Guardrails let you attach policy-driven safety, privacy, and reliability checks to any API key. Bound policies fire on every completion — includingDocumentation Index
Fetch the complete documentation index at: https://docs.hitheo.ai/llms.txt
Use this file to discover all available pages before exploring further.
/v1/completions (streaming + non-streaming), /v1/images, /v1/code, /v1/documents, the playground, and embed widgets.
Strict opt-in contract
Nothing is live until you bind a policy to an API key. New keys, all existing keys, and every Clerk-session caller bypass guardrails entirely. There is no automatic org or user default — guardrails are an enforcement boundary, not a routing preference, so the gateway never enforces something you didn’t explicitly opt into. This means:- Existing API keys are unaffected until you explicitly bind a policy. Roll-out is risk-free.
- No fallback chain. Clearing a binding (
policy_id: null) drops the key back to no enforcement. - Per-key, 1:1. A key can be bound to at most one policy; the policy can be bound to any number of keys.
The five built-in protections
Each protection is a pure evaluator: identical inputs always produce identical outputs. They are vendor-neutral and never name an upstream provider in customer-facing strings.pii_redactor — Hide personal info
pii_redactor — Hide personal info
Scrubs emails, phone numbers, US Social Security numbers, Luhn-valid credit cards, and driver-license-shaped IDs from the prompt before any model call.Allowed verdicts:
redact (default), flag (logged only, prompt untouched).Phase: input only.prompt_injection — Block jailbreak attempts
prompt_injection — Block jailbreak attempts
Catches the canonical OWASP-LLM01 patterns: “ignore previous instructions”, role takeovers, fake system messages, prompt exfiltration probes.Allowed verdicts:
deny (default — surfaces a vendor-neutral 422 to the caller), flag.Phase: input only.json_repair — Always return valid JSON
json_repair — Always return valid JSON
Detects malformed JSON in model output. On a
repair verdict the runner makes a single repair attempt and falls back to the original buffer if it can’t produce valid JSON.Allowed verdicts: repair (default), flag.Phase: output only.Config: { "fenced": true } (default true) — when set, the evaluator first strips a fenced json block before parsing, mirroring the most common LLM output shape.max_length — Cap input or output length
max_length — Cap input or output length
Truncates the buffer to a configured cap and appends a small
…[truncated] marker.Allowed verdicts: truncate (default), flag.Phase: input and/or output.Config (one of):{ "maxChars": 16000 }— hard character cap.{ "maxTokens": 4000, "charsPerToken": 4 }— token cap (default 4 chars/token).
profanity — Keep it clean
profanity — Keep it clean
Detects common profanity in either direction. Defaults to
flag (visibility only) so you can review activity before deciding to block.Allowed verdicts: flag (default), deny.Phase: input and/or output.The five verdicts
A rule’s verdict tells the runner what to do when the protection fires. The runner aggregates the worst verdict across all matched rules for the phase and applies it once:- flag — Pass-through. The match is recorded in the execution log; nothing else changes.
- redact — Replace the matched text with
[REDACTED:<kind>]before forwarding. - truncate — Trim to the configured limit and append
…[truncated]. - repair — Run a single repair pass on the buffer. Used by
json_repaironly. - deny — Block the request. On
inputthe gateway returns a422 invalid_request_errorwithcode: "guardrail_violation"before any model call. Onoutputthe response content is replaced with the deny reason and artifacts are dropped.
guardrail_violation code so your application can branch on policy enforcement without knowing which detector fired.
Quick start — dashboard
- Open Dashboard → Guardrails and pick a preset (or click New policy for a blank policy).
- Add protections from the picker; each card lets you choose the phase and verdict.
- Use the Test bench in the right rail to replay a fixture prompt + optional model output through the policy. You’ll see the verdict trail per rule.
- In the Bindings panel below the editor, toggle the policy on each API key you want to enforce against.
Quick start — SDK
The management endpoints (
/api/v1/guardrail-policies/*, /api/v1/keys/{id}/guardrails) require the billing API key scope — matching the routing-preferences convention. Mint a key with this scope explicitly if you plan to manage policies programmatically; the default completions scope is sufficient for normal traffic.Error envelope
When an input-phasedeny fires, the API returns:
422. For streaming completions the same envelope is emitted as an error SSE event:
ignore_previous) is intentionally not included so attackers can’t probe which detector fired.
Streaming behavior
Output guardrails run after the full token stream has been collected on the server. This means:- Tokens stream live as the model produces them. The client sees the raw, pre-guard text in real time.
- The
doneevent carries the guarded buffer. Any client that re-renders the message body from thedoneevent (the official SDK, the playground, the embed widget) ends up displaying the policy-compliant version. - On
deny, an additionalerrorSSE event is emitted beforedone. Thedone.contentis replaced with the deny reason and any artifacts produced during the turn are dropped.
done.content, plan on switching to non-streaming mode (stream: false) when you need byte-exact policy enforcement on every visible character.
Team policies
scope: "team" policies are visible to every member of the owning organization and require the manageWebhooks permission to create/update (re-used as the generic team-config gate; there is no dedicated manageGuardrails bit today).
Personal policies are only visible to their author. Team keys can only be bound to team policies in the same org — the binding endpoint enforces this with guardrail_policy_not_bindable when the scopes don’t match.
Audit & telemetry
Every evaluation writes an immutable row toguardrail_executions (90-day retention, enforced by the BullMQ retention worker). Each row captures:
policy_id,key_id,user_id— who/what triggered the evaluation.guardrail_id,phase,verdict— the rule and outcome.matched_pattern,redacted_count— bounded forensic context (no raw prompts).latency_ms— per-rule overhead, surfaced in the dashboard’s activity feed.
theo.guardrails.executions.list({ keyId, policyId, since }) or GET /api/v1/guardrail-executions. The dashboard’s Recent activity panel renders the same rows in plain language.
What guardrails do not change
- Model allowlists, routing rules, skills, brand soul, response style, and personality all keep working independently. Guardrails layer around the request/response path; they don’t override other key-level features.
- Existing keys without a binding stay unchanged. You opt in per key.
- Cost is unaffected. Builtins are pure evaluators (sub-millisecond);
json_repair’s repair pass adds a single small call only when malformed JSON is detected.
