Self-hosted API additions

The OSS Ocular image speaks the same universal /classify contract. This page covers the operational layer on top of that — the things that only make sense when you control the container.

Enterprise customers: refer to the documentation packaged with your release. These public docs describe the open-source build, which has no license layer. Your build adds licensing, expiry handling, Console integration, and other operational surfaces that aren't covered here.

Authentication model

Ocular self-hosted does NOT apply per-request authentication at the network layer. The customer's network (VPC, firewall, reverse proxy) is the trust boundary.

Put your own auth in front (API key in a reverse proxy, mTLS, ingress controller policy, …) if your application needs it.

Additional endpoints

Method	Path	Purpose
`POST`	`/classify`	Score a conversation — see api-reference
`GET`	`/health`	Readiness + mode + GPU info
`GET`	`/manifest`	Heads manifest + service introspection

GET /health

Readiness probe. No request body.

{
  "status": "ok",
  "mode": "local",
  "version": "3ba4207",
  "queue_depth": 0
}

Field	Meaning
`status`	`"ok"` when the model is loaded and accepting requests. `"starting"` during warmup (~25 s after container start).
`mode`	`"local"` (GPU inference — the default) or `"remote"` (scoring proxied via an external URL, set by `SCORING_URL` in the container env).
`version`	Deployed release identifier (`OCULAR_VERSION` from your env).
`queue_depth`	Current inference queue depth. `0` means idle.

Use status == "ok" as your readiness gate.

GET /manifest

Service manifest — confirms the deployed release.

{
  "version": "3ba4207",
  "mode": "local",
  "heads": 126
}

Field	Meaning
`version`	Deployed release identifier. Matches `OCULAR_VERSION` in your `.env`.
`mode`	`"local"` / `"remote"` / `"stub"`.
`heads`	Number of behavioral classification heads loaded. Omitted in `remote` mode.

The manifest is pinned to the release you deployed — it doesn't change at runtime.

Additional `/classify` request fields

Self-hosted deployments accept three additional request fields tied to the Console-paired logging path:

Field	Type	Default	Meaning
`log`	bool	`false`	When `true`, Ocular pushes a summary to the configured Console after scoring. Requires `session_id` + `user_id`. Returns `400` if either is missing or if `OCULAR_CONSOLE_URL` isn't set.
`session_id`	string	`null`	Metadata. Correlates this call to a conversation. Only consumed when `log=true`.
`user_id`	string	`null`	Metadata. Identifies the end-user. Only consumed when `log=true`.
`agent_id`	string	`null`	Metadata. Optional identifier for which AI agent produced the assistant turns. Only consumed when `log=true`.

Use messages[] (not text:) on logged requests — only messages gives Console the per-turn transcript; text: lands a scored session with empty turns[], silently.

Example with logging to Console (requires OCULAR_CONSOLE_URL in the container env):

curl -s -X POST http://localhost:8080/classify \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [{"role":"user","content":"I havent felt like myself lately"}],
    "session_id": "conv-42",
    "user_id":    "u-1234",
    "log":        true
  }'

Additional status codes

In addition to the universal codes:

Code	When
`400`	Also emitted when `log=true` is set but `OCULAR_CONSOLE_URL` isn't configured, or `log=true` is set without both `session_id` and `user_id`.
`429`	Batch queue saturated (self-host has a single in-flight queue per container). Retry with backoff.
`502`	Remote scoring failed. Only emitted when the container is configured with an upstream `SCORING_URL` (relay mode).
`503`	Model still warming up (first ~25 s after container start). Retry with backoff.

Environment variables

Configured via customer.env (or whatever your compose / k8s env source is). Selected highlights — see deployment.md for the full list.

Var	Purpose
`OCULAR_VERSION`	Required at startup. Release identifier; surfaced in `/health` + `/manifest` + `meta.version`.
`OCULAR_CONSOLE_URL`	Console ingest endpoint (e.g. `http://console:3950/api/ingest`). Required to use `log: true`.
`OCULAR_MAX_REQUEST_BYTES`	Request body size cap. Default 1 MiB.
`SCORING_URL`	Optional. When set, container proxies inference to an upstream scorer (relay mode).

Rate limits and throughput

Self-host Ocular doesn't apply per-request rate limits at the application layer — it will score as fast as your hardware allows. Under sustained overload, /classify returns 429 once the in-flight queue saturates. If you need ingress protection (e.g. against runaway loops in your own app), put a reverse proxy with rate limiting in front.

Practical throughput on a 20 GB datacenter-class GPU (A10G reference; production target for the on-prem image):

Single-turn /classify: ~188 ms p50, ~3.76 req/s sustained at concurrency=4 (zero-fail under 4× burst).
Trajectory (stride=3, ~68 turns): ~880 ms p50 at concurrency=1.
Cold-start adds ~28 s on first request after container start.

24 GB and 80 GB datacenter-class cards produce comparable trajectory latency — the workload is memory-bandwidth bound. For higher throughput, run multiple containers behind a load balancer.