Streaming Completions

Set stream: true on any POST /api/v1/completions request to receive a Server-Sent Events (SSE) stream instead of a single JSON response.

Request

curl -N -X POST https://www.hitheo.ai/api/v1/completions \
  -H "Authorization: Bearer $THEO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a haiku about code", "stream": true}'

Requires authentication via Bearer token. See Authentication.

Wire Format

Every event is two lines followed by a blank line:

event: <event_name>
data: <json_payload>

Callers MUST buffer partial chunks and split on \n\n — token events can be tens of bytes each and the transport does not guarantee alignment with event boundaries.

Event Types (in order of arrival)

The server emits events in a fixed order: thinking → meta → (skills? / genui_meta?) → tool* → artifact* → token* → done. If anything fails mid-stream an error event is emitted and the stream closes.

Event	Description
`thinking`	Heartbeat — a single byte so proxies flush the response immediately. No JSON payload
`meta`	Engine info, resolved mode, artifacts, routing, `conversation_id`, `request_id`
`skills`	Active skills for this turn (only when one or more skills were applied)
`genui_meta`	GenUI component library info (only when `resolved_mode === "genui"`)
`tool`	A tool was called (may fire multiple times)
`artifact`	A generated file (image, document, code, video) (may fire multiple times)
`token`	A text chunk (fires for each token)
`done`	Final payload: full content, follow-ups, usage, `conversation_id`, `request_id`
`error`	An error occurred — payload matches the REST error envelope

Payload Schemas

`meta`

interface StreamMetaData {
  id: string;                       // completion id (cmpl_...)
  mode: ChatMode;                   // mode you requested
  resolved_mode: ChatMode;          // mode after intent classification
  model: {
    id: string;                     // theo-branded id (e.g. "theo-1-flash")
    label: string;                  // human-friendly label
    engine: string;                 // engine subsystem (e.g. "theo-core")
  };
  tools: Array<{ name: string; status: string; description?: string }>;
  artifacts: unknown[];             // pre-populated artifacts (images, docs)
  brand?: Record<string, unknown>;  // optional brand-soul overlay
  routing?: Record<string, unknown>;// routing-engine telemetry
  conversation_id: string | null;   // null for stateless callers
  request_id: string;               // mirror of X-Request-Id header
}

`token`

interface StreamTokenData {
  token: string; // text chunk; concatenate in order to reconstruct the full content
}

`tool`

interface StreamToolData {
  name: string;        // tool name (e.g. "Intent classifier", "browser_navigate")
  status: string;      // "pending" | "running" | "complete" | "error"
  description?: string;// short human-readable description
}

`artifact`

The payload depends on artifact.type. Common fields: id, type, title, providerId, modelId. Image artifacts include imageUrl; video artifacts include videoUrl; document artifacts include downloadUrl and sizeBytes; code artifacts include code and language.

`genui_meta`

interface GenUIMetaEvent {
  library: string; // component library id (e.g. "theo")
  tools: string[]; // tool names the Renderer's toolProvider should expose
}

`skills`

interface StreamSkillsData {
  active: Array<{
    id: string;
    slug: string;
    name: string;
    intensity?: number; // 0–100 when the skill supports intensity
  }>;
}

`done`

interface StreamDoneData {
  id: string;
  content: string;                  // full text output (same as concatenated tokens)
  follow_ups: Array<{ label: string; prompt: string }>;
  structured_output?: {
    type: string;
    data: unknown;
    parse_error?: string;
  };
  skills_active?: Array<{ id: string; slug: string; name: string; intensity?: number }>;
  routing?: Record<string, unknown>;
  usage: {
    cost_cents: number;
    prompt_tokens: number;    // always 0 for non-text modes
    completion_tokens: number;// always 0 for non-text modes
    total_tokens: number;
    cached?: boolean;         // true when served from the semantic cache
  };
  conversation_id: string | null;
  request_id: string;
}

`error`

The error event payload matches the REST error envelope (see Overview):

interface StreamErrorData {
  error: {
    message: string;
    type: string;            // e.g. "rate_limit_error", "server_error"
    code: string;            // e.g. "stream_interrupted", "orchestration_failed"
    request_id: string | null;
  };
}

Example Stream

event: thinking
data: {"status":"processing"}

event: meta
data: {"id":"cmpl_abc123","mode":"auto","resolved_mode":"fast","model":{"id":"theo-1-flash","label":"Theo Flash","engine":"theo-core"},"tools":[],"artifacts":[],"conversation_id":null,"request_id":"req_9f2e1a"}

event: token
data: {"token":"Lines"}

event: token
data: {"token":" of"}

event: token
data: {"token":" code"}

event: done
data: {"id":"cmpl_abc123","content":"Lines of code","follow_ups":[],"usage":{"cost_cents":0.01,"prompt_tokens":8,"completion_tokens":3,"total_tokens":11},"conversation_id":null,"request_id":"req_9f2e1a"}

Extracting `conversation_id`

conversation_id appears on both meta and done. For stateless callers (no conversation_id in the request) it is null. For callers that passed a conversation_id in the request, it is echoed back unchanged. The SDK exposes this as stream.conversationId after the stream completes.

const stream = theo.stream({ prompt: "..." });
for await (const event of stream) {
  // consume events
}
console.log(stream.conversationId); // populated from meta / done

Mid-Stream Errors

When orchestration fails after meta has already been emitted (e.g. the upstream LLM provider times out or a per-mode rate limit kicks in), the server emits an error event with the REST envelope and closes the stream:

event: meta
data: {"id":"cmpl_abc123","mode":"auto","resolved_mode":"fast","model":{"id":"theo-1-flash","label":"Theo Flash","engine":"theo-core"},"tools":[],"artifacts":[],"conversation_id":null,"request_id":"req_9f2e1a"}

event: token
data: {"token":"Lines"}

event: error
data: {"error":{"message":"Rate limit exceeded for fast mode (120 RPM on 'free' tier).","type":"rate_limit_error","code":"mode_rate_limit_exceeded","request_id":"req_9f2e1a"}}

Use the same code path as your REST error handler:

for await (const event of stream) {
  if (event.type === "error") {
    const { message, type, code, request_id } = event.data.error;
    throw new Error(`[${type}/${code}] ${message} (request_id=${request_id})`);
  }
  // ...
}

SDK Usage

The SDK returns a TheoStream — an async iterable with cancel() for mid-generation abort plus final-metadata properties.

import { Theo } from "@hitheo/sdk";

const theo = new Theo({ apiKey: process.env.THEO_API_KEY! });
const stream = theo.stream({ prompt: "Explain DNS" });

// Graceful cancel after 3s (e.g. user clicked "Stop generating")
setTimeout(() => stream.cancel(), 3000);

for await (const event of stream) {
  switch (event.type) {
    case "meta":     console.log("Mode:", event.data.resolved_mode); break;
    case "token":    process.stdout.write(event.token); break;
    case "tool":     console.log("Tool:", event.data.name, event.data.status); break;
    case "artifact": console.log("Artifact:", event.data); break;
    case "error":    console.error("Stream error:", event.data.error); break;
    case "done":     console.log("\nCost:", event.data.usage.cost_cents); break;
  }
}

console.log("conversation_id:", stream.conversationId);
console.log("usage:", stream.usage);
console.log("request_id:", stream.requestId);

See Streaming for the conceptual guide.

Overview

Completions

Media Generation

Audio

Skills API

E.V.I. Canvas

Workflows

Hooks

Settings

Embed Widgets

Guardrails

Routing Studio

Theo Browser

Benchmarks

Webhooks

Billing

Resources

Request

Wire Format

Event Types (in order of arrival)

Payload Schemas

`meta`

`token`

`tool`

`artifact`

`genui_meta`

`skills`

`done`

`error`

Example Stream

Extracting `conversation_id`

Mid-Stream Errors

SDK Usage

​Request

​Wire Format

​Event Types (in order of arrival)

​Payload Schemas

​meta

​token

​tool

​artifact

​genui_meta

​skills

​done

​error

​Example Stream

​Extracting conversation_id

​Mid-Stream Errors

​SDK Usage

Request

Wire Format

Event Types (in order of arrival)

Payload Schemas

`meta`

`token`

`tool`

`artifact`

`genui_meta`

`skills`

`done`

`error`

Example Stream

Extracting `conversation_id`

Mid-Stream Errors

SDK Usage