Completions
Create Completion
Send a prompt through the full orchestration pipeline — intent classification, skill loading, model routing, agent loop, and response.
POST
Create Completion
The core endpoint of the Theo API. Sends a prompt through the full orchestration pipeline and returns the complete response.
The response follows the OpenAI
For real-time token delivery, set
stream: true or see Streaming Completions.Authentication
Requires a Bearer token. See Authentication.Request Body
The prompt text. Must be a non-empty string.
Execution mode. When set to
auto, Theo classifies the prompt and selects the optimal engine automatically.Available modes:auto— Classify prompt and route to best engine (default)fast— Low-latency responses for simple queriesthink— Deep reasoning for complex analysiscode— Code generation (Theo Code engine, extended output budget)image— Image generation (Theo Create)video— Async. UsePOST /api/v1/video+ job polling, not this endpoint.research— Async. UsePOST /api/v1/research+ job polling, not this endpoint.roast— Humorous, irreverent tonegenui— Generate interactive UI components (OpenUI Lang)
Enable SSE streaming. When
true, returns a text/event-stream response instead of JSON. See Streaming.Continue an existing conversation. Pass the conversation ID to maintain multi-turn context.
Skill slugs to activate for this request. These are merged with the user’s installed skills.Each slug activates a skill’s prompt extension, tools, and model preferences for this completion. You can find slugs in the dashboard (copy icon on each skill card), via
GET /api/v1/skills, or in the E.V.I. Canvas Input node.See Activating Skills via API for the full guide.Inline tool definitions the model can call during the agent loop.
Override Theo’s personality for this request.
"theo"— Default Theo persona"none"— No persona (raw model output){ "system_prompt": "You are..." }— Custom system prompt
Sampling temperature (0–2). Higher values produce more creative output.
Maximum agent loop iterations (1–20). Each iteration is a think → act → observe cycle.
Override the engine used for specific modes. Keys are mode names (e.g.,
"code", "think"), values are Theo engine IDs (e.g., "theo-1-reason", "theo-1-flash"). See List Models for valid engine IDs.Response format.
"theo" for the default format, "openai" for OpenAI-compatible format.Arbitrary key-value metadata attached to the completion. Returned in the response and logged in the audit trail.
Component library identifier for GenUI mode. Used by E.V.I. callers for custom UI rendering.
Request Examples
With Skills and Tools
Response
Unique completion ID (prefixed
cmpl_).Always
"completion".ISO 8601 timestamp.
The generated text content.
The mode you requested (e.g.,
"auto").The mode Theo actually used after intent classification (e.g.,
"fast", "think", "code").The Theo engine that handled the request.
Tools called during the agent loop.
Generated files (images, code, documents) produced during the completion.
Suggested next prompts.
Token counts and cost.
For non-text modes (
image, video, tts, stt), prompt_tokens and completion_tokens are always 0 — tokens are not a meaningful billing unit there. Use usage.cost_cents as the sole usage metric for those modes.The metadata you passed in the request, echoed back.
The server-side conversation id this turn resolved against.
null when no conversation was created or attached. Echoed unchanged when you passed conversation_id in the request.Server-assigned request identifier (also returned as the
X-Request-Id header). Include this in support tickets so we can look up the request in logs.Example Response
OpenAI-Compatible Format
Passformat: "openai" to receive responses in OpenAI’s chat.completions format. This allows drop-in replacement in existing OpenAI-based applications.
chat.completion schema with choices, usage, and model fields.
Semantic Caching
Non-conversation completions (noconversation_id) are automatically cached. Identical requests return cached results instantly at zero cost. See Semantic Caching.
Cached responses include "_cached": true in the response body.
Errors
| Status | Code | Description |
|---|---|---|
| 400 | validation_error | Invalid request body (missing prompt, invalid mode, etc.) |
| 401 | invalid_api_key | Missing or invalid API key |
| 402 | insufficient_credits | Account has insufficient balance |
| 404 | not_found | Conversation ID not found |
| 429 | rate_limit_exceeded | Too many requests — check Retry-After header |
| 500 | server_error | Internal server error |
Create Completion
