Skip to main content
Set stream: true on any POST /api/v1/completions request to receive a Server-Sent Events (SSE) stream instead of a single JSON response.

Request

curl -N -X POST https://www.hitheo.ai/api/v1/completions \
  -H "Authorization: Bearer $THEO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a haiku about code", "stream": true}'
Requires authentication via Bearer token. See Authentication.

Wire Format

Every event is two lines followed by a blank line:
event: <event_name>
data: <json_payload>
Callers MUST buffer partial chunks and split on \n\n — token events can be tens of bytes each and the transport does not guarantee alignment with event boundaries.

Event Types (in order of arrival)

The server emits events in a fixed order: thinkingmeta → (skills? / genui_meta?) → tool* → artifact* → token* → done. If anything fails mid-stream an error event is emitted and the stream closes.
EventDescription
thinkingHeartbeat — a single byte so proxies flush the response immediately. No JSON payload
metaEngine info, resolved mode, artifacts, routing, conversation_id, request_id
skillsActive skills for this turn (only when one or more skills were applied)
genui_metaGenUI component library info (only when resolved_mode === "genui")
toolA tool was called (may fire multiple times)
artifactA generated file (image, document, code, video) (may fire multiple times)
tokenA text chunk (fires for each token)
doneFinal payload: full content, follow-ups, usage, conversation_id, request_id
errorAn error occurred — payload matches the REST error envelope

Payload Schemas

meta

interface StreamMetaData {
  id: string;                       // completion id (cmpl_...)
  mode: ChatMode;                   // mode you requested
  resolved_mode: ChatMode;          // mode after intent classification
  model: {
    id: string;                     // theo-branded id (e.g. "theo-1-flash")
    label: string;                  // human-friendly label
    engine: string;                 // engine subsystem (e.g. "theo-core")
  };
  tools: Array<{ name: string; status: string; description?: string }>;
  artifacts: unknown[];             // pre-populated artifacts (images, docs)
  brand?: Record<string, unknown>;  // optional brand-soul overlay
  routing?: Record<string, unknown>;// routing-engine telemetry
  conversation_id: string | null;   // null for stateless callers
  request_id: string;               // mirror of X-Request-Id header
}

token

interface StreamTokenData {
  token: string; // text chunk; concatenate in order to reconstruct the full content
}

tool

interface StreamToolData {
  name: string;        // tool name (e.g. "Intent classifier", "browser_navigate")
  status: string;      // "pending" | "running" | "complete" | "error"
  description?: string;// short human-readable description
}

artifact

The payload depends on artifact.type. Common fields: id, type, title, providerId, modelId. Image artifacts include imageUrl; video artifacts include videoUrl; document artifacts include downloadUrl and sizeBytes; code artifacts include code and language.

genui_meta

interface GenUIMetaEvent {
  library: string; // component library id (e.g. "theo")
  tools: string[]; // tool names the Renderer's toolProvider should expose
}

skills

interface StreamSkillsData {
  active: Array<{
    id: string;
    slug: string;
    name: string;
    intensity?: number; // 0–100 when the skill supports intensity
  }>;
}

done

interface StreamDoneData {
  id: string;
  content: string;                  // full text output (same as concatenated tokens)
  follow_ups: Array<{ label: string; prompt: string }>;
  structured_output?: {
    type: string;
    data: unknown;
    parse_error?: string;
  };
  skills_active?: Array<{ id: string; slug: string; name: string; intensity?: number }>;
  routing?: Record<string, unknown>;
  usage: {
    cost_cents: number;
    prompt_tokens: number;    // always 0 for non-text modes
    completion_tokens: number;// always 0 for non-text modes
    total_tokens: number;
    cached?: boolean;         // true when served from the semantic cache
  };
  conversation_id: string | null;
  request_id: string;
}

error

The error event payload matches the REST error envelope (see Overview):
interface StreamErrorData {
  error: {
    message: string;
    type: string;            // e.g. "rate_limit_error", "server_error"
    code: string;            // e.g. "stream_interrupted", "orchestration_failed"
    request_id: string | null;
  };
}

Example Stream

event: thinking
data: {"status":"processing"}

event: meta
data: {"id":"cmpl_abc123","mode":"auto","resolved_mode":"fast","model":{"id":"theo-1-flash","label":"Theo Flash","engine":"theo-core"},"tools":[],"artifacts":[],"conversation_id":null,"request_id":"req_9f2e1a"}

event: token
data: {"token":"Lines"}

event: token
data: {"token":" of"}

event: token
data: {"token":" code"}

event: done
data: {"id":"cmpl_abc123","content":"Lines of code","follow_ups":[],"usage":{"cost_cents":0.01,"prompt_tokens":8,"completion_tokens":3,"total_tokens":11},"conversation_id":null,"request_id":"req_9f2e1a"}

Extracting conversation_id

conversation_id appears on both meta and done. For stateless callers (no conversation_id in the request) it is null. For callers that passed a conversation_id in the request, it is echoed back unchanged. The SDK exposes this as stream.conversationId after the stream completes.
const stream = theo.stream({ prompt: "..." });
for await (const event of stream) {
  // consume events
}
console.log(stream.conversationId); // populated from meta / done

Mid-Stream Errors

When orchestration fails after meta has already been emitted (e.g. the upstream LLM provider times out or a per-mode rate limit kicks in), the server emits an error event with the REST envelope and closes the stream:
event: meta
data: {"id":"cmpl_abc123","mode":"auto","resolved_mode":"fast","model":{"id":"theo-1-flash","label":"Theo Flash","engine":"theo-core"},"tools":[],"artifacts":[],"conversation_id":null,"request_id":"req_9f2e1a"}

event: token
data: {"token":"Lines"}

event: error
data: {"error":{"message":"Rate limit exceeded for fast mode (120 RPM on 'free' tier).","type":"rate_limit_error","code":"mode_rate_limit_exceeded","request_id":"req_9f2e1a"}}
Use the same code path as your REST error handler:
for await (const event of stream) {
  if (event.type === "error") {
    const { message, type, code, request_id } = event.data.error;
    throw new Error(`[${type}/${code}] ${message} (request_id=${request_id})`);
  }
  // ...
}

SDK Usage

The SDK returns a TheoStream — an async iterable with cancel() for mid-generation abort plus final-metadata properties.
import { Theo } from "@hitheo/sdk";

const theo = new Theo({ apiKey: process.env.THEO_API_KEY! });
const stream = theo.stream({ prompt: "Explain DNS" });

// Graceful cancel after 3s (e.g. user clicked "Stop generating")
setTimeout(() => stream.cancel(), 3000);

for await (const event of stream) {
  switch (event.type) {
    case "meta":     console.log("Mode:", event.data.resolved_mode); break;
    case "token":    process.stdout.write(event.token); break;
    case "tool":     console.log("Tool:", event.data.name, event.data.status); break;
    case "artifact": console.log("Artifact:", event.data); break;
    case "error":    console.error("Stream error:", event.data.error); break;
    case "done":     console.log("\nCost:", event.data.usage.cost_cents); break;
  }
}

console.log("conversation_id:", stream.conversationId);
console.log("usage:", stream.usage);
console.log("request_id:", stream.requestId);
See Streaming for the conceptual guide.