Skip to main content
Theo streams responses over Server-Sent Events (SSE). Set stream: true on any completion request to receive events as they happen instead of waiting for the full response. SSE is the same protocol OpenAI and Anthropic use — any HTTP client that can parse text/event-stream works. The SDK wraps it into an async iterable.

Why streaming?

  • Time-to-first-token under 300ms on fast modes — users see output immediately instead of staring at a spinner.
  • Agent-loop transparencytool and artifact events fire the moment Theo uses a tool or produces a file, so you can render a live “Theo is browsing…” or “Generated image” card.
  • Graceful cancellation — close the HTTP connection (or call stream.cancel() in the SDK) to stop billing for tokens you don’t want.

Event flow

A normal turn emits events in this order:
  1. thinking — a heartbeat so proxies flush the response.
  2. meta — resolved mode, branded model, routing telemetry, conversation_id, request_id.
  3. skills (optional) — which skills fired for this turn.
  4. genui_meta (optional) — only when resolved_mode === "genui".
  5. tool / artifact — as they happen (may interleave).
  6. token — one per text chunk.
  7. done — full content, follow-ups, usage, conversation_id, request_id. If anything fails, the server emits an error event whose payload matches the REST error envelope ({ error: { message, type, code, request_id } }) and closes the stream. No special handling required — reuse your HTTP error code path. See Streaming Completions (API reference) for the per-event JSON schemas, wire-format rules, and a mid-stream 429 example.

SDK Streaming

The SDK returns a TheoStream. It is async-iterable, so for await works out of the box. After the stream completes, it exposes conversationId, usage, model, and requestId as properties.
const stream = theo.stream({
  prompt: "Explain how TCP/IP works",
});

for await (const event of stream) {
  switch (event.type) {
    case "meta":     console.log("Mode:", event.data.resolved_mode); break;
    case "token":    process.stdout.write(event.token); break;
    case "tool":     console.log("Tool:", event.data.name, event.data.status); break;
    case "artifact": renderArtifact(event.data); break;
    case "error":    console.error("Stream error:", event.data.error); break;
    case "done":     console.log("\nCost:", event.data.usage.cost_cents); break;
  }
}

// Final metadata — populated from meta + done events.
console.log("conversation_id:", stream.conversationId);
console.log("usage:", stream.usage);
console.log("request_id:", stream.requestId);

Cancelling mid-generation

TheoStream.cancel() aborts the underlying HTTP connection, so the server stops generating and billing stops. The iterator ends after the next event boundary.
const stream = theo.stream({ prompt: "Write a 5000-word essay…" });
// Wire up your UI's "Stop generating" button:
cancelButton.onclick = () => stream.cancel();
for await (const event of stream) {
  // ...
}
This is the cleanest way to implement a chat UI’s “stop” button — no hacky HTTP connection teardown, no custom AbortController threading.

Raw SSE (curl / fetch)

Use the canonical www host to avoid an apex-to-www redirect that some HTTP clients handle by stripping the Authorization header (see 401 Troubleshooting):
curl -N -X POST https://www.hitheo.ai/api/v1/completions \
  -H "Authorization: Bearer $THEO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a haiku about code", "stream": true}'
The response body is a stream of SSE events:
event: meta
data: {"id":"cmpl_abc","resolved_mode":"fast","model":{"id":"theo-1-flash","label":"Theo Flash","engine":"theo-core"},"conversation_id":null,"request_id":"req_9f2e1a", ...}

event: token
data: {"token":"Lines"}

event: token
data: {"token":" of"}

event: done
data: {"id":"cmpl_abc","content":"Lines of code","usage":{"cost_cents":0.01,"prompt_tokens":8,"completion_tokens":2,"total_tokens":10}, ...}

Multi-turn conversations

The conversation_id appears on both the meta and done events. When you pass one in the request, it is echoed back unchanged; when you don’t, it is null (the server doesn’t create a persistent conversation for API-key callers unless you explicitly call POST /api/v1/conversations). The SDK captures the ID automatically — read stream.conversationId after the stream completes.

E.V.I. Streaming

E.V.I. instances (theo.evi(...)) support streaming with the same TheoStream surface:
const evi = theo.evi({
  persona: "You are Kai, a coding assistant...",
});

const stream = evi.stream({
  prompt: "Debug this React hook",
});

for await (const event of stream) {
  if (event.type === "token") process.stdout.write(event.token);
}
console.log("request_id:", stream.requestId);