Self-hosted API additions
The OSS Ocular image speaks the same universal /classify contract. This page covers the operational layer on top of that — the things that only make sense when you control the container.
Enterprise customers: refer to the documentation packaged with your release. These public docs describe the open-source build, which has no license layer. Your build adds licensing, expiry handling, Console integration, and other operational surfaces that aren't covered here.
Authentication model
Ocular self-hosted does NOT apply per-request authentication at the network layer. The customer's network (VPC, firewall, reverse proxy) is the trust boundary.
Put your own auth in front (API key in a reverse proxy, mTLS, ingress controller policy, …) if your application needs it.
Additional endpoints
| Method | Path | Purpose |
|---|---|---|
POST |
/classify |
Score a conversation — see api-reference |
GET |
/health |
Readiness + mode + GPU info |
GET |
/manifest |
Heads manifest + service introspection |
GET /health
Readiness probe. No request body.
{
"status": "ok",
"mode": "local",
"version": "3ba4207",
"queue_depth": 0
}| Field | Meaning |
|---|---|
status |
"ok" when the model is loaded and accepting requests. "starting" during warmup (~25 s after container start). |
mode |
"local" (GPU inference — the default) or "remote" (scoring proxied via an external URL, set by SCORING_URL in the container env). |
version |
Deployed release identifier (OCULAR_VERSION from your env). |
queue_depth |
Current inference queue depth. 0 means idle. |
Use status == "ok" as your readiness gate.
GET /manifest
Service manifest — confirms the deployed release.
{
"version": "3ba4207",
"mode": "local",
"heads": 126
}| Field | Meaning |
|---|---|
version |
Deployed release identifier. Matches OCULAR_VERSION in your .env. |
mode |
"local" / "remote" / "stub". |
heads |
Number of behavioral classification heads loaded. Omitted in remote mode. |
The manifest is pinned to the release you deployed — it doesn't change at runtime.
Additional /classify request fields
Self-hosted deployments accept three additional request fields tied to the Console-paired logging path:
| Field | Type | Default | Meaning |
|---|---|---|---|
log |
bool | false |
When true, Ocular pushes a summary to the configured Console after scoring. Requires session_id + user_id. Returns 400 if either is missing or if OCULAR_CONSOLE_URL isn't set. |
session_id |
string | null |
Metadata. Correlates this call to a conversation. Only consumed when log=true. |
user_id |
string | null |
Metadata. Identifies the end-user. Only consumed when log=true. |
agent_id |
string | null |
Metadata. Optional identifier for which AI agent produced the assistant turns. Only consumed when log=true. |
Use messages[] (not text:) on logged requests — only messages
gives Console the per-turn transcript; text: lands a scored session
with empty turns[], silently.
Example with logging to Console (requires OCULAR_CONSOLE_URL in the
container env):
curl -s -X POST http://localhost:8080/classify \
-H 'Content-Type: application/json' \
-d '{
"messages": [{"role":"user","content":"I havent felt like myself lately"}],
"session_id": "conv-42",
"user_id": "u-1234",
"log": true
}'Additional status codes
In addition to the universal codes:
| Code | When |
|---|---|
400 |
Also emitted when log=true is set but OCULAR_CONSOLE_URL isn't configured, or log=true is set without both session_id and user_id. |
429 |
Batch queue saturated (self-host has a single in-flight queue per container). Retry with backoff. |
502 |
Remote scoring failed. Only emitted when the container is configured with an upstream SCORING_URL (relay mode). |
503 |
Model still warming up (first ~25 s after container start). Retry with backoff. |
Environment variables
Configured via customer.env (or whatever your compose / k8s env source
is). Selected highlights — see deployment.md for the full list.
| Var | Purpose |
|---|---|
OCULAR_VERSION |
Required at startup. Release identifier; surfaced in /health + /manifest + meta.version. |
OCULAR_CONSOLE_URL |
Console ingest endpoint (e.g. http://console:3950/api/ingest). Required to use log: true. |
OCULAR_MAX_REQUEST_BYTES |
Request body size cap. Default 1 MiB. |
SCORING_URL |
Optional. When set, container proxies inference to an upstream scorer (relay mode). |
Rate limits and throughput
Self-host Ocular doesn't apply per-request rate limits at the
application layer — it will score as fast as your hardware allows.
Under sustained overload, /classify returns 429 once the in-flight
queue saturates. If you need ingress protection (e.g. against runaway
loops in your own app), put a reverse proxy with rate limiting in
front.
Practical throughput on a 20 GB datacenter-class GPU (A10G reference; production target for the on-prem image):
- Single-turn
/classify: ~188 ms p50, ~3.76 req/s sustained at concurrency=4 (zero-fail under 4× burst). - Trajectory (stride=3, ~68 turns): ~880 ms p50 at concurrency=1.
- Cold-start adds ~28 s on first request after container start.
24 GB and 80 GB datacenter-class cards produce comparable trajectory latency — the workload is memory-bandwidth bound. For higher throughput, run multiple containers behind a load balancer.
See also
- api-reference — the universal request/response contract
- deployment — installation, GPU requirements, compose files, troubleshooting
- risk-interpretation — what verdicts and risks mean