Skip to main content

Self-hosted API additions

The OSS Ocular image speaks the same universal /classify contract. This page covers the operational layer on top of that — the things that only make sense when you control the container.

Enterprise customers: refer to the documentation packaged with your release. These public docs describe the open-source build, which has no license layer. Your build adds licensing, expiry handling, Console integration, and other operational surfaces that aren't covered here.

Authentication model

Ocular self-hosted does NOT apply per-request authentication at the network layer. The customer's network (VPC, firewall, reverse proxy) is the trust boundary.

Put your own auth in front (API key in a reverse proxy, mTLS, ingress controller policy, …) if your application needs it.

Additional endpoints

Method Path Purpose
POST /classify Score a conversation — see api-reference
GET /health Readiness + mode + GPU info
GET /manifest Heads manifest + service introspection

GET /health

Readiness probe. No request body.

{
  "status": "ok",
  "mode": "local",
  "version": "3ba4207",
  "queue_depth": 0
}
Field Meaning
status "ok" when the model is loaded and accepting requests. "starting" during warmup (~25 s after container start).
mode "local" (GPU inference — the default) or "remote" (scoring proxied via an external URL, set by SCORING_URL in the container env).
version Deployed release identifier (OCULAR_VERSION from your env).
queue_depth Current inference queue depth. 0 means idle.

Use status == "ok" as your readiness gate.

GET /manifest

Service manifest — confirms the deployed release.

{
  "version": "3ba4207",
  "mode": "local",
  "heads": 126
}
Field Meaning
version Deployed release identifier. Matches OCULAR_VERSION in your .env.
mode "local" / "remote" / "stub".
heads Number of behavioral classification heads loaded. Omitted in remote mode.

The manifest is pinned to the release you deployed — it doesn't change at runtime.

Additional /classify request fields

Self-hosted deployments accept three additional request fields tied to the Console-paired logging path:

Field Type Default Meaning
log bool false When true, Ocular pushes a summary to the configured Console after scoring. Requires session_id + user_id. Returns 400 if either is missing or if OCULAR_CONSOLE_URL isn't set.
session_id string null Metadata. Correlates this call to a conversation. Only consumed when log=true.
user_id string null Metadata. Identifies the end-user. Only consumed when log=true.
agent_id string null Metadata. Optional identifier for which AI agent produced the assistant turns. Only consumed when log=true.

Use messages[] (not text:) on logged requests — only messages gives Console the per-turn transcript; text: lands a scored session with empty turns[], silently.

Example with logging to Console (requires OCULAR_CONSOLE_URL in the container env):

curl -s -X POST http://localhost:8080/classify \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [{"role":"user","content":"I havent felt like myself lately"}],
    "session_id": "conv-42",
    "user_id":    "u-1234",
    "log":        true
  }'

Additional status codes

In addition to the universal codes:

Code When
400 Also emitted when log=true is set but OCULAR_CONSOLE_URL isn't configured, or log=true is set without both session_id and user_id.
429 Batch queue saturated (self-host has a single in-flight queue per container). Retry with backoff.
502 Remote scoring failed. Only emitted when the container is configured with an upstream SCORING_URL (relay mode).
503 Model still warming up (first ~25 s after container start). Retry with backoff.

Environment variables

Configured via customer.env (or whatever your compose / k8s env source is). Selected highlights — see deployment.md for the full list.

Var Purpose
OCULAR_VERSION Required at startup. Release identifier; surfaced in /health + /manifest + meta.version.
OCULAR_CONSOLE_URL Console ingest endpoint (e.g. http://console:3950/api/ingest). Required to use log: true.
OCULAR_MAX_REQUEST_BYTES Request body size cap. Default 1 MiB.
SCORING_URL Optional. When set, container proxies inference to an upstream scorer (relay mode).

Rate limits and throughput

Self-host Ocular doesn't apply per-request rate limits at the application layer — it will score as fast as your hardware allows. Under sustained overload, /classify returns 429 once the in-flight queue saturates. If you need ingress protection (e.g. against runaway loops in your own app), put a reverse proxy with rate limiting in front.

Practical throughput on a 20 GB datacenter-class GPU (A10G reference; production target for the on-prem image):

  • Single-turn /classify: ~188 ms p50, ~3.76 req/s sustained at concurrency=4 (zero-fail under 4× burst).
  • Trajectory (stride=3, ~68 turns): ~880 ms p50 at concurrency=1.
  • Cold-start adds ~28 s on first request after container start.

24 GB and 80 GB datacenter-class cards produce comparable trajectory latency — the workload is memory-bandwidth bound. For higher throughput, run multiple containers behind a load balancer.

See also