Skip to main content

Integration patterns

Once Ocular is deployed, you need to decide where in your request path to call it. The pattern depends on latency budget, whether you want to react to scoring before the assistant replies, and how much infrastructure you already have.


Minimum viable implementation

The smallest defensible integration for "show crisis resources when the user shows strong self-directed suicide / self-harm signals":

const r = await fetch('http://ocular:8080/classify', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ messages: conversation }),
  signal: AbortSignal.timeout(200),
}).then(r => r.json()).catch(() => null);

if (r?.verdict === 'danger' && r?.subject === 'self') {
  showHotlines();
}

return assistantReply();  // always — Ocular is a signal, not a gate

Two conditions:

  • verdict === 'danger' — the aggregate classification; already incorporates fiction framing, corroboration, and imminence internally.
  • subject === 'self' — filters third-party reports ("my friend is suicidal").

Three principles baked in:

  • Short timeout (200 ms). Ocular latency doesn't block the user reply.
  • Fail open. Null result on network/timeout → no resources shown, reply still goes out.
  • Always continue. The assistant reply is not conditional on Ocular.

Variant — narrower trigger

If you want to react only to strongly-classified, self-attributed, low-fiction signals (higher precision, lower recall), add a fiction gate:

if (r?.verdict === 'danger' && r?.subject === 'self' && r?.fiction < 0.3) {
  showHotlines();
}

Tune the fiction threshold and per-axis risks.<axis>.score thresholds against your own labelled data — see risk-interpretation.md §"Tuning".


Pattern 1 — Inline scoring (synchronous)

The most direct: every time your AI app sees a user message, score it synchronously before the assistant responds.

user message → your app → [score via Ocular] → your app → assistant replies
                             (~30 ms single-turn)

When it fits: chat products where your policy requires reacting to Ocular's classification before the assistant's reply — for example, swapping the system prompt, gating output, or surfacing a UI element. Also when your p99 latency budget can absorb 30-50 ms.

Tradeoffs:

  • Adds latency to every user message. Typically fine for chat UIs, might not be fine for voice / real-time.
  • Ocular unavailability blocks your request path. Put a timeout + circuit breaker in front (see below).
  • Scoring cost scales linearly with your chat volume.

Example (Node, express-ish):

async function handleUserMessage(userId, sessionId, text) {
  const scored = await fetchWithTimeout(
    'http://ocular:8080/classify',
    {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        text: buildConversationText(sessionId, text),
        session_id: sessionId,
        user_id: userId,
      }),
    },
    { timeoutMs: 200 }
  );

  if (scored?.verdict === 'danger') {
    return renderCrisisResources(userId);
  }

  return normalReply(sessionId, text);
}

Always timeout short. Don't let an Ocular hiccup stall your entire conversation.


Pattern 2 — Async scoring (fire and forget)

Score in the background, after the assistant has already replied. Act on the result lazily — dashboards, alerts, post-hoc moderation.

user message → your app → assistant replies (user sees response)
                      ↓
               (background) → Ocular → Console / alert pipeline

When it fits: when the scoring result doesn't need to land before the assistant's reply — dashboards, offline evaluation, alert pipelines that consume the result asynchronously.

Tradeoffs:

  • No added latency to the user.
  • You can't use the score to modify the current reply.
  • If Ocular is down, you lose visibility, but the product keeps working.

Example (Python, async task):

import asyncio

async def handle_message(user_id, session_id, messages):
    # messages is the running [{role, content}, ...] conversation array.
    reply = await produce_assistant_reply(session_id, messages)
    # Fire-and-forget the scoring call
    asyncio.create_task(score_in_background(user_id, session_id, messages))
    return reply

async def score_in_background(user_id, session_id, messages):
    try:
        async with httpx.AsyncClient(timeout=5.0) as client:
            await client.post(
                "http://ocular:8080/classify",
                json={
                    "messages": messages,         # the full conversation so far
                    "session_id": session_id,
                    "user_id": user_id,
                    "log": True,                  # opt-in conduit push to Console
                },
            )
    except Exception as e:
        logger.warning("ocular scoring failed: %s", e)

The Ocular response is pushed to Console via the conduit — you don't need to read it in your own app unless you want to persist it.

Important for Console usability: pass messages: [...], not text: .... Console builds its per-turn transcript display from the messages[] array it receives via the conduit push — if you only send text:, Console stores the session with the correct risk verdict + scores but an empty transcript (message_count: 0, no turn rows), because Ocular deliberately doesn't forward your raw user content to Console unless you've passed it as structured messages[].

If you can't use messages[] (e.g. the upstream you're wrapping only exposes flat text), two acceptable workarounds:

  • Parse the text on User:/Assistant: delimiters yourself before posting to Ocular.
  • Accept the empty-transcript Console session — verdict + axes still show, and the session appears on dashboards and in watchlist evaluation; only the turn-by-turn UI panel is blank.

Pattern 3 — Tee / sample-to-evaluation

Mirror a fraction of your traffic to Ocular for evaluation, without affecting the serving path.

user message → your app → assistant reply → user
                   ↓
             (sample %)
                   ↓
                 Ocular → Console

When it fits: when scoring 100% of traffic is not cost-effective, or when you're evaluating the scoring surface against your own data before committing to an inline integration.

Shape (as offline batch):

  1. Log every conversation to your application log.
  2. A batch job reads the log, samples, and POSTs each to /classify with session_id + agent_id + log: true.
  3. Console stores the scored sessions; per-agent axis breakdowns are at /agents/<agent_id>.

Sampling rate is a budget decision — the scoring surface is stateless, so any fraction works.


Pattern 4 — Watchlist-driven alerts (Console)

Console evaluates configurable rules against incoming sessions and POSTs to a webhook URL on match. Rules are edited via the Console UI or /api/watchlists; webhooks can target any HTTPS endpoint.

user message → your app → Ocular (via /classify) → Console → watchlist match → webhook

See console.md §Watchlists for rule syntax, condition paths, and webhook payload shape.


Pattern 5 — Export rules, evaluate in your own pipeline

Console's GET /api/watchlists?format=export returns a portable JSON representation of the rule set. You can evaluate that JSON in your own rules engine against /classify responses, independent of Console.

Design:    Console → watchlist UI → /api/watchlists?format=export
Run:       your rules engine (reading the exported JSON) → your alert bus

When it fits: when you want rule evaluation outside Console's SQLite store and in-process webhook outbox — for example, higher durability, lower latency, or co-location with existing alert infrastructure.

Console remains usable for session inspection even if you move rule evaluation out.


Pattern 6 — Fleet (scaled)

Multiple Ocular containers behind a load balancer, all reporting to a single Console instance.

                      ┌── Ocular #1 ──┐
your app's LB ────────├── Ocular #2 ──├──(conduit)──→ Console
                      └── Ocular #N ──┘

When it fits: throughput beyond what a single GPU can deliver (roughly millions of messages/day per box). License tokens don't encode a container limit, so technically any tier can run a fleet — container count is a contractual matter, not a cryptographic one. Check your contract before scaling horizontally.

Notes:

  • Set OCULAR_URL on Console to point at the LB. This only affects Console's own scoring paths (/api/health?deep=true and direct /api/ingest posts without a pre-scored body) — the conduit push goes Ocular→Console over OCULAR_CONSOLE_URL and doesn't use OCULAR_URL at all.
  • The conduit push is a direct HTTP POST from each Ocular to Console. If Console's /api/ingest becomes the bottleneck under burst, a message queue between Ocular and Console is a known-good interposition.

Handling latency

Typical single-turn /classify takes ~30 ms on a 24 GB datacenter-class GPU (A10G reference). That's fast, but not zero, and the tail matters.

Budgets. Set a timeout on every Ocular call. Suggestion:

  • 200 ms for inline scoring (Pattern 1). Enough for normal plus some headroom. If scoring takes longer, something's wrong.
  • 5 s for async scoring (Pattern 2). You're not blocking; just bound the retry window.
  • 10 s for trajectory calls (per_turn: true). Long conversations legitimately take hundreds of ms.

Circuit breaking. Wrap Ocular calls in a circuit breaker (e.g. hystrix, pybreaker, resilience4j). If 5 calls fail in a row, stop calling for 30 seconds and log loudly. Prevents cascading failures when the box is unhealthy.

Warmup. First call after container start takes ~25 s while the model loads. /health returns "starting" during this window — don't route traffic until it's "ok".


Handling failure

Ocular unavailability categories and what to do:

Failure mode Symptom Reasonable response
Slow 5x slower than normal Your circuit breaker should handle. Alert if sustained.
Timeout no response within budget Skip scoring for this message. Don't block user reply.
503 "Model still loading" during container startup (~25 s) Retry with backoff.
429 "Server overloaded" batch queue saturated Retry with backoff. Scale replicas if sustained.
502 "Remote scoring failed" only when Ocular's SCORING_URL is configured (Ocular-side; distinct from Console's OCULAR_URL) Upstream scorer failed; inspect the remote endpoint.
Connection refused container isn't running — e.g. license past grace ended at startup Check container logs. Replace license if needed.

Ocular availability is not required for your app to serve user messages. The failure modes above are all recoverable with timeouts and fail-open logic in the client.


What to persist

Ocular does not store raw message content. Durable records of scored sessions are the caller's responsibility — persist the /classify response alongside whatever request metadata you want to retain. meta.version identifies the model + heads configuration a given response was produced against.