How It Works
- Theo generates a deterministic cache key from the request parameters (prompt, mode, skills, tools)
- Checks the server-side cache for a matching entry
- On hit: returns the cached response immediately — zero cost, sub-10ms latency
- On miss: runs the full orchestration pipeline, caches the result, and returns it
"_cached": true in the response body so you can detect cache hits.
Cache Key Computation
The cache key is a deterministic hash of:- Prompt text (exact match)
- Mode (
auto,fast,think, etc.) - Active skill slugs
- Inline tool definitions
Requests with a
conversation_id are never cached, since they depend on conversation history that changes between calls.TTL Behavior
Cache entries expire automatically based on the content type:| Mode | Typical TTL |
|---|---|
fast / auto | Minutes |
think / code | Minutes to hours |
research | Not cached (async job) |
image / video | Not cached (generative) |
Cache Scope
- Cache is per-user — your cached responses are not shared with other accounts
- Cache is per-key scoped when org-level keys are used
