Deploying Ocular on-prem
This document walks you from "I received the tarball and a license token"
to "the service is running and I've made my first /classify call."
Everything in it has been executed end-to-end on a clean Ubuntu 24.04
host with an Ampere-class or newer NVIDIA datacenter GPU. If a step
doesn't work for you, the
Troubleshooting section covers every failure mode we've
actually seen.
For higher-level context on what Ocular is and isn't, see the other
docs in your customer portal — risk-interpretation.md, api-reference.md,
and integration-patterns.md in particular.
What you received
From NOPE:
ocular-platform-<version>.tar.zst— one compressed Docker tarball containing both images (ocular:<version>andocular-console:<version>). Typical size ~8-9 GB. Model weights are baked into the image — no HuggingFace pull, no external dependency at run time.ocular-platform-<version>.tar.zst.sha256— SHA256 sidecar for verification.- A signed license token — the value for
OCULAR_LICENSE_KEY. FormatOCULAR-v1.<base64-payload>.<base64-signature>, ECDSA P-256, issued with a tier (pilot/production/enterprise) and an expiry. customer-compose.yml(included in this repo atdocker/customer-compose.yml) — the one-file Docker Compose spec that starts everything.customer.env.example(indocker/customer.env.example) — env template.
The Ocular image contains:
- A language model with merged adapter weights, packaged for inference.
- The trained behavioral-probe heads and calibration data.
- The Python service (FastAPI) exposing
/classify,/health,/manifest. - License-enforcement cryptography (ECDSA P-256 public key baked in).
The Console image contains:
- A SvelteKit UI + SQLite-backed store.
- Ingest, watchlist, webhook, and session-detail endpoints.
- No GPU dependency, no model.
No outbound network is required for operation once the image is loaded. Console's webhook outbox will call out to your configured receivers only if you set webhooks up — otherwise Ocular and Console are entirely network-contained.
Before you start
Hardware
The supported floor depends on which mode you call. Single-pass
(per_turn=false) classifies one assessment per conversation and is
GPU-light. Trajectory (per_turn=true) emits a score at every speaker
boundary and is GPU-memory-bound on long inputs.
| Resource | Single-pass only (per_turn=false) |
Trajectory mode (per_turn=true) |
|---|---|---|
| GPU | 8 GB+ VRAM, Ampere or newer datacenter-class | 24 GB+ VRAM, Ampere or newer datacenter-class |
| CPU | 4 cores | 4 cores (8+ for sustained concurrent ingest) |
| RAM | 16 GB | 16 GB (32 GB+ for sustained concurrent ingest) |
| Disk | 30 GB free (image, tarball, Console DB, working space) | 30 GB free; 100 GB+ SSD if retaining long histories |
| Network | None required at runtime; outbound only if using webhooks | (same) |
Trajectory mode peaks around 14 GiB transient working memory beyond the resident model on long conversations (~200+ turns). Cards smaller than 24 GiB VRAM may run out of memory under sustained concurrent trajectory load on long inputs; single-pass mode is unaffected at lower specs.
On a 24 GB datacenter-class GPU (A10G reference), cold boot is ~20 s,
/classify p50 is ~25 ms, and trajectory (8-message conversation,
stride=3) p50 is ~42 ms. Larger cards (A100, H100, L40) perform
proportionally better on batched workloads. Pre-Ampere GPUs are not
supported.
Host OS
- Linux (x86_64). Tested on Ubuntu 22.04 and 24.04 LTS.
- Other distros work if they can run NVIDIA Container Toolkit.
- Docker on Mac and Docker Desktop on Windows are not supported — GPU passthrough isn't compatible with Docker Desktop's VM.
- Bare metal or full-virtualization VMs only. Container-in-container
environments (Vast.ai, RunPod, other OCI-pod platforms) block
mountandunsharecapabilities and the Ocular container will fail to start withoperation not permitted. If you don't have bare metal, use a VM with root-level Docker (AWS EC2, GCE, Hetzner Dedicated, Vultr GPU, DigitalOcean GPU Droplets).
Software prerequisites
Install in order:
# Docker Engine + Compose v2 (from Docker's official repo)
curl -fsSL https://get.docker.com | sh
sudo systemctl enable --now docker
# If you want to run docker without `sudo`, add your user to the `docker`
# group. This requires a fresh login session to take effect — running
# `usermod -aG` and then `docker load` in the same shell will fail with
# "permission denied on the Docker daemon socket" because the group
# membership didn't propagate into the current process. Either log out
# and back in, or use `sg docker -c "<command>"` to run one-offs with
# the group active. The rest of this doc uses `sudo docker` throughout
# to sidestep this entirely; drop the `sudo` if your shell already has
# docker-group membership.
# sudo usermod -aG docker $USER # then re-login
# NVIDIA Container Toolkit (the `--gpus all` plumbing)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Utilities
sudo apt-get install -y zstd curl ca-certificatesSanity check:
# On host — shows your GPU
nvidia-smi
# Inside a container — proves the runtime wiring
sudo docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smiIf either command doesn't return your GPU, stop here and fix the stack before proceeding. Ocular will not start without GPU access.
Step 1 — Transfer and verify the tarball
Get the platform artifacts and the customer bundle onto the host. Two common paths:
Option A — download directly on the host from the customer portal (fastest, no laptop round-trip; requires outbound HTTPS from the box):
# Replace <version> with whatever NOPE shipped you. Use curl -O or
# wget so the file is saved with its canonical name — this matters
# because `sha256sum -c` reads the filename from the sidecar and
# compares against the local filename verbatim; saving as anything
# else fails the check.
mkdir -p /opt/ocular && cd /opt/ocular
curl -fsSL --retry 5 -C - -O '<portal-platform-tarball-download-url>'
curl -fsSL --retry 5 -C - -O '<portal-platform-sha-download-url>'
curl -fsSL --retry 5 -C - -O '<portal-customer-bundle-download-url>'
tar -xzf customer-bundle-<version>.tar.gz
cd customer-bundle-<version>Option B — scp from your laptop (if your host has no outbound network, or you've already pulled the artifacts locally):
scp ocular-platform-<version>.tar.zst root@your-box:/opt/ocular/
scp ocular-platform-<version>.tar.zst.sha256 root@your-box:/opt/ocular/
scp customer-bundle-<version>.tar.gz root@your-box:/opt/ocular/
# Then on the host:
cd /opt/ocular && tar -xzf customer-bundle-<version>.tar.gz
cd customer-bundle-<version>The customer bundle unpacks into a subdirectory named
customer-bundle-<version>/ — every subsequent command in this doc
assumes you've cd'd into that directory (the compose file lives
there, the env template lives there, this very doc is there).
Verify the SHA256 before loading:
# From /opt/ocular (where the tarball + sidecar sit):
sha256sum -c ocular-platform-<version>.tar.zst.sha256
# → ocular-platform-<version>.tar.zst: OKIf verification fails, do not proceed. The image is not safe to load. Re-download and recheck; contact support if the problem persists.
Load both images into Docker:
zstd -d -c /opt/ocular/ocular-platform-<version>.tar.zst | sudo docker load
# → Loaded image: ocular:<version>
# → Loaded image: ocular-console:<version>This takes ~2-3 minutes on modern hardware; most of it is disk write.
Confirm the images are present:
sudo docker images | grep -E 'ocular(-console)?:<version>'The tarball can be deleted after load if disk is tight:
rm ocular-platform-<version>.tar.zstStep 2 — Configure environment
Copy the env template and fill in the required values:
cp customer.env.example .env
chmod 600 .env # contains secretsThe rename is two conventions stacked: Docker Compose auto-loads a file
literally named .env (no --env-file flag needed for the bare
docker compose -f customer-compose.yml ... invocations used throughout
this doc), and the .example suffix is the standard marker for "template,
safe to commit" — the live .env should be gitignored and chmod 600,
since it'll hold your license token.
Edit .env:
# Must match the tag baked into the images you loaded.
# Get this from `docker images` output or the filename of the tarball.
OCULAR_VERSION=<version>
# Your signed license token. Required — Ocular refuses to start without it.
# Store this only in .env, never commit to git, rotate when NOPE re-issues.
OCULAR_LICENSE_KEY=OCULAR-v1.<base64-payload>.<base64-signature>
# Optional: Console session retention (default 7 days).
RETENTION_DAYS=7
# Optional: upstream transcript provider (leave unset if you don't have one).
# PROVIDER_URL=http://your-app.internal:8000On the license token: keep it in .env (mode 600). Do not bake it into
the image, commit it to source, or put it in a Docker build ARG. If it
leaks you need to contact NOPE to rotate.
Step 3 — Decide your deployment mode
customer-compose.yml supports three usage modes, selected by whether or
not you pass --profile console to docker compose.
Ocular only, no Console
Your app calls Ocular /classify directly. Console never starts. No
webhooks, no operator UI. Simplest deployment; lowest resource footprint.
sudo docker compose -f customer-compose.yml up -dUse this if:
- You already have a monitoring/alerting pipeline and just need scoring.
- You want to minimize attack surface and operational surface.
- You're piloting the scoring API before deciding on Console.
Ocular + Console (operational)
Full stack. Console receives scored sessions via a conduit push from Ocular, evaluates your configured watchlists, and can fire webhooks. Includes an operator UI.
sudo docker compose -f customer-compose.yml --profile console up -dUse this if:
- You want session-level history + UI for operators/reviewers.
- You want configurable watchlist rules with webhook delivery.
- You need to iterate on detection rules before wiring your own pipeline.
CONSOLE_MODE=full in customer-compose.yml enables Console's
mutating endpoints (creating/editing watchlists, changing retention, CSV
export). The Console image defaults to readonly if the var is absent —
that's a fail-closed default. The compose file sets full explicitly so
an operational deploy has the full UI out of the box.
Design in Console, export to your own pipeline
Same compose invocation as the operational mode — the difference is
intent: you run Console until your watchlist rules are stable, then export
them via GET /api/watchlists?format=export and evaluate them in your own
rules engine. Console can then be stopped or left running for session
inspection.
Console durability characteristics to factor in: SQLite-backed storage
with 7-day default retention (RETENTION_DAYS). Webhook delivery is an
in-process outbox with at-least-once semantics and exponential backoff up
to ~1 hour before giving up. Console availability doesn't affect Ocular's
scoring — a Console outage is visibility-only.
Step 4 — Start and watch it warm up
sudo docker compose -f customer-compose.yml --profile console up -dOcular takes 25-60 seconds to reach healthy on a warm host, longer on
a cold one. The compose file sets start_period: 120s so Docker doesn't
mark it unhealthy prematurely. Don't panic if it shows starting for a
minute; that's expected.
During this window, the logs progress through (in order):
- License validation — line begins
License valid:with org and tier. - CUDA platform detection.
- Model loading from
/models/ocular. - Checkpoint shard load.
- Head load + calibration data from
/models/heads. - Warmup pass.
- Service ready —
Uvicorn running on http://0.0.0.0:8080.
If license validation fails (step 1), see Troubleshooting — license errors.
If you used --profile console, Console starts only after Ocular is
healthy. This is intentional: Console's dependency on Ocular means starting
it first would produce confusing "upstream unreachable" errors during the
warmup window. You'll see Console as starting until Ocular turns green.
Watch progress:
# Combined log tail (both services)
sudo docker compose -f customer-compose.yml --profile console logs -f
# Just Ocular:
sudo docker logs -f $(sudo docker compose -f customer-compose.yml ps -q ocular)
# Health status at a glance:
sudo docker compose -f customer-compose.yml psStep 5 — Verify it works
Shallow health
The stock customer-compose.yml does not publish container ports to
the host by default (the ports: lines are commented out so that most
customers running their app in the same compose project can reach Ocular
via its internal DNS name with nothing exposed externally). That means
curl http://localhost:8080/... from the host won't connect out of the
box.
Two equally-good ways to hit the service:
Option A — run curl inside the container (works regardless of whether ports are published):
sudo docker compose -f customer-compose.yml exec ocular \
curl -fsS http://localhost:8080/health | jq .Option B — publish ports to the host (if you're running the sanity
check from your laptop, or want to integrate with a non-compose app).
Uncomment the ports: block under ocular in customer-compose.yml,
restart, then:
curl -fsS http://localhost:8080/health | jq .The README.md in this bundle also covers a compose-override pattern
(ports-override.yml) for keeping the default compose file pristine
while still publishing during testing.
Expected output (either way):
{
"status": "ok",
"mode": "local",
"version": "<version>",
"queue_depth": 0
}If you get {"status": "starting", ...}, Ocular is still warming up. Wait
and retry.
First /classify call
sudo docker compose -f customer-compose.yml exec ocular \
curl -fsS -X POST http://localhost:8080/classify \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"user","content":"I feel hopeless and do not want to be here"}]}' \
| jq '{verdict, subject, risks: .risks.suicide, imminence, meta}'Expected output (shape — numbers may differ):
{
"verdict": "danger",
"subject": "self",
"risks": { "level": "critical", "score": 0.85 },
"imminence": { "level": "critical", "score": 0.62 },
"meta": {
"version": "<version>",
"inference_ms": 28,
"windowed": false,
"windows": 1,
"request_id": "abc12345"
}
}If you see "verdict": "clear" on obvious-crisis input like the above,
the scoring pipeline probably isn't working correctly. Check the logs
for errors (docker compose -f customer-compose.yml logs ocular); verify OCULAR_VERSION + the
loaded image tag match; if the problem persists after a clean reload,
contact support with the request_id from the response headers.
(The verdict floor varies slightly across model versions and input
phrasing — "watch" instead of "danger" on this specific string
isn't necessarily a bug. What's definitely wrong is "clear" with
risks.suicide.score near zero on obvious-crisis text.)
For the full response shape (all 8 risk axes + 4 AI-concern axes +
signals[] + stability diagnostics), drop the | jq filter; the raw
JSON is documented in api-reference.md.
Sanity check: benign inputs stay low
Same two options as above — inside the container via docker compose exec, or from the host if you published ports. Showing the exec form
since it works either way:
sudo docker compose -f customer-compose.yml exec ocular \
curl -s -X POST http://localhost:8080/classify \
-H 'Content-Type: application/json' \
-d '{"text":"User: Hello.","detail":true}' \
| jq '{verdict, suicide: .risks.suicide, top_scores: (.detail.scores | to_entries | sort_by(-.value) | .[0:3])}'Expected shape (numbers may vary slightly on your hardware):
{
"verdict": "clear",
"suicide": {"level": "minimal", "score": 0.02},
"top_scores": [
{ "key": "signal_XXXX", "value": 0.24 },
{ "key": "signal_YYYY", "value": 0.12 },
{ "key": "signal_ZZZZ", "value": 0.08 }
]
}What you want to see: verdict is clear, risks.suicide.score well
below 0.1, top scores trailing off smoothly, no head pegged at 1.0.
A bare greeting is a degenerate input, so small non-zero scores are normal
— but broadly saturated scores (multiple heads at 1.0) indicate a build
problem. If you see that, contact support; do not ship the deployment to
production.
A benign conversational input should score even lower. Try
"User: can you help me plan a weekend trip to Edinburgh?" —
risks.suicide.score should be essentially zero (< 0.02).
Console (if running)
As with Ocular, Console's port 3950 is expose: only by default. Two
paths:
# Inside the container (works by default, no ports config needed):
sudo docker compose -f customer-compose.yml --profile console exec console \
curl -fsS http://localhost:3950/api/health | jq .
# → {"status":"ok","sessions":0}
# Deep check — verifies Console can reach Ocular too:
sudo docker compose -f customer-compose.yml --profile console exec console \
curl -fsS 'http://localhost:3950/api/health?deep=true' | jq .
# → {"status":"ok","db":"ok","ocular":"ok","sessions":0}Or uncomment the ports: block under console in customer-compose.yml
and curl http://localhost:3950/... from the host directly.
If deep health returns "ocular": "unreachable", Console can't talk to
Ocular. Check that both are on the same Docker network
(docker compose -f customer-compose.yml ps should show both as running)
and that OCULAR_URL in the compose file
points at http://ocular:8080/classify.
End-to-end sanity check (Ocular → Console conduit push → Console session store)
Two request shapes, one gotcha.
/classifyaccepts bothtext:(a"User: ...\n\nAssistant: ..."blob) andmessages: [{role, content}]. Both score equivalently. Onlymessages:lets Console populate the per-turn transcript when you passlog: true—text:gives Console a scored session with an emptyturns[]. Failure is silent (200 OK, session row stored, transcript just empty). Usemessages:for anything logged to Console. Seeintegration-patterns.mdfor the full trade-off.
# Make a scored request with log=true + session_id + user_id — Ocular
# will fire-and-forget push it to Console, which stores it.
SESSION_ID="sanity-$(date +%s)"
sudo docker compose -f customer-compose.yml exec ocular \
curl -s -X POST http://localhost:8080/classify \
-H 'Content-Type: application/json' \
-d "{
\"messages\":[{\"role\":\"user\",\"content\":\"I feel hopeless and do not want to be here\"}],
\"session_id\":\"$SESSION_ID\",
\"user_id\":\"sanity-user\",
\"log\":true
}" > /dev/null
# After 1-2 seconds, Console should have the session. The response shape
# is { session_id, user_id, ..., ocular: <verbatim /classify body>, turns: [...] }
# — see console.md and api-reference.md for the full schemas.
sleep 2
sudo docker compose -f customer-compose.yml --profile console exec console \
curl -s "http://localhost:3950/api/sessions/$SESSION_ID" \
| jq '{session_id, verdict: .ocular.verdict, suicide: .ocular.risks.suicide.score, message_count}'
# → { "session_id": "sanity-...", "verdict": "danger", "suicide": 0.6, "message_count": 1 }If the session doesn't appear in Console:
- Confirm Ocular's env has
OCULAR_CONSOLE_URL=http://console:3950/api/ingest(the compose file does this by default — don't override it unless you have a reason). - Confirm you passed
"log": true.session_id/user_idalone do not trigger a push — logging is explicitly opt-in. - Check Ocular logs for
Conduit push failed:entries.
If this end-to-end flow succeeds, your deployment is working.
Step 6 — Expose to your application
By default the compose file only exposes Ocular and Console on the Docker internal network. To let your application reach them:
Same Docker network as Ocular
The bundle's customer-compose.yml declares its default network with the
fixed name ocular-platform. If your app runs in Docker Compose too,
join that network as external and the services ocular and console
become reachable by name on ports 8080 and 3950:
# your-app's compose.yml
services:
your-app:
# ... your service definition ...
networks:
- default
- ocular-platform
networks:
ocular-platform:
external: true # declared by the Ocular bundle; we're joining itHost-published ports
Two equivalent recipes — either edit the compose file in place or drop
a small override file beside it. The override-file pattern (option B)
is preferred if you want to keep customer-compose.yml pristine for
easier diffing against future release bundles.
Option A — edit customer-compose.yml in place. Uncomment the
ports: section under each service:
services:
ocular:
# ...
ports:
- "127.0.0.1:8080:8080" # bind to localhost only; put a reverse proxy in front
console:
# ...
ports:
- "127.0.0.1:3950:3950"Option B — override file. Create ports-override.yml next to the
compose file:
# ports-override.yml
services:
ocular:
ports: ["127.0.0.1:8080:8080"]
console:
ports: ["127.0.0.1:3950:3950"]Then include it in every docker compose invocation:
sudo docker compose -f customer-compose.yml -f ports-override.yml --profile console up -dDo not bind to 0.0.0.0 on a network-exposed host. Ocular has no
authentication on /classify — anyone who reaches port 8080 can use your
license quota. The standard deployment pattern is: bind to 127.0.0.1,
put your reverse proxy (nginx, Caddy, Traefik, or your cloud's load
balancer) in front, handle TLS + auth at the proxy layer.
VPC / private network
On AWS/GCP/Azure: leave ports: commented (internal only) and give the
application and Ocular the same VPC + security group / firewall rule. Your
app resolves the Docker host's private IP and hits port 8080 there.
Step 7 — Exposing Console (optional)
Console runs on port 3950. Published to the public internet it would
be unauthenticated (Console has no built-in auth layer — same trust
model as Ocular). Typical exposure patterns:
SSH tunnel (simplest, single-user)
ssh -L 3950:localhost:3950 your-ocular-host
# then open http://localhost:3950 in your browserNo infrastructure to set up. Fits occasional single-user access.
Reverse proxy behind your SSO
If you already run nginx / Caddy / Traefik with OIDC or SAML in front of internal tooling, add Console as another upstream:
console.internal.your-company.com → http://ocular-host:3950
Gate it with your existing SSO. This is the right pattern for teams that already have an internal-tooling auth pattern.
Cloudflare Tunnel + Cloudflare Access (no public IP required)
If you use Cloudflare as your DNS provider and want browser access to Console from anywhere without opening port 3950 to the internet:
- Install
cloudflaredon the Ocular host:curl -sSL -o /tmp/cloudflared.deb \ https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb sudo dpkg -i /tmp/cloudflared.deb - In the Cloudflare dashboard: Zero Trust → Networks → Tunnels → Create a tunnel. Copy the tunnel token.
- Install
cloudflaredas a systemd service with the token:sudo cloudflared service install <tunnel-token> - In the tunnel's Public Hostnames tab, add:
- Subdomain:
console(or similar) - Domain: your Cloudflare-managed domain
- Service:
HTTP→localhost:3950
- Subdomain:
- In Zero Trust → Access → Applications, add a Self-hosted app
for the hostname above. Set a policy (e.g. "Emails ending in
@your-company.com") and 24 h session duration.
The tunnel opens an outbound QUIC connection from your host to Cloudflare
— no inbound ports on your host are required. Unauthenticated requests
are redirected to Cloudflare Access login before reaching Console.
Users visit console.your-company.com in a browser, sign in via the
configured identity provider, and land on Console. Survives host restarts
via systemd. Verify gating with curl -sI https://console.your-company.com/ —
an unauthenticated request should return a 302 to
cloudflareaccess.com/cdn-cgi/access/login/....
Troubleshooting
nvidia-smi works on host but not in Docker
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
returns "could not select device driver" or similar.
Cause: NVIDIA Container Toolkit isn't configured as a Docker runtime.
Fix:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerdocker load fails with "no space left on device"
Something like:
Error processing tar file(exit status 1): write /.../layer.tar: no space left on device
You don't have enough disk for the 25 GB working set (image + tarball + Docker overlay fs).
Fix: free at least 40 GB or mount /var/lib/docker on a larger disk.
Ocular exits immediately with LicenseError
Possible messages (all raised by lib/license.py):
OCULAR_LICENSE_KEY not set. Provide a valid license token via environment variable.— you didn't populate.env, or the var isn't being read.Invalid token format: expected 3 dot-separated parts— the token is truncated or not inOCULAR-v1.payload.signatureform.Invalid token prefix: expected 'OCULAR-v1', got '<prefix>'— wrong token or accidentally pasted a different format.Signature verification failed: <detail>— the token is malformed or tampered with, or the public key baked into your image doesn't match the private key NOPE used to sign the token.License expired N day(s) ago (grace period ended)— you're past the 72-hour grace period after the expiry timestamp. Contact NOPE for renewal.
If your token is within 14 days of expiry, Ocular still starts but
logs License valid: ... status=expiring. If it's past exp but within
the 72-hour grace window, Ocular starts with status=grace and logs
remaining hours. Only after the grace period is startup refused.
Only the token's expiry (exp) is enforced at runtime. Volume-related
contract terms (scoring volume, user count, concurrent containers, etc.)
are not encoded in the token and not checked by the running container —
they're audited contractually.
Verify the env var is visible to the container:
sudo docker compose -f customer-compose.yml config | grep -i licenseIf it's empty, check that .env is in the same directory as
customer-compose.yml and contains OCULAR_LICENSE_KEY=OCULAR-v1....
Ocular logs reference a missing build toolchain
These are bugs in older images that were fixed before release. If you see them, you have an older image.
Fix: confirm OCULAR_VERSION in .env matches the <version> that
NOPE shipped in the tarball filename, and that the tarball is the current
release. Run docker images | grep ocular: — the tag there should match
what NOPE said to expect. If you don't know the current version, contact
support.
Ocular /health returns 503 "Model still loading"
Cause: normal. The warmup takes 25-60 seconds on a warm host.
Fix: wait. If it's been more than 3 minutes, check logs for errors:
sudo docker logs $(sudo docker compose -f customer-compose.yml ps -q ocular) \
| tail -50Console shows "ocular": "unreachable" in deep health
Cause: Console can't resolve or connect to the Ocular service.
Fix:
- Run
sudo docker compose -f customer-compose.yml ps— both services should sayrunning. - Confirm Console's env has
OCULAR_URL=http://ocular:8080/classify(compose default). - Test connectivity:
sudo docker compose -f customer-compose.yml --profile console exec console curl -fsS http://ocular:8080/health— should return Ocular's health JSON.
Compose says Console is unhealthy but curl localhost:3950/api/health works fine
Cause: the healthcheck in an older Console image uses wget which
isn't present; the container is actually healthy, Docker just can't probe
it.
Fix: make sure you're on a current Console image. The current image
uses curl for the healthcheck and has it installed.
/classify is slow (>500ms for single-message input)
Check:
- GPU isn't idle:
nvidia-smishould show 15-90% util during a request burst. If GPU util is near 0, the request is routing to CPU — checkdocker compose -f customer-compose.yml exec ocular python3 -c "import torch; print(torch.cuda.is_available())". Should printTrue. - VRAM isn't paging:
nvidia-smishould show ~4-5 GB used. If it's near the max, you've hit memory pressure — reduce input size or upgrade GPU. - You're not hitting a warmup: the first request after idle takes 1-2s for kernel dispatch caches to warm. Steady-state is ~25-50ms.
Tarball loaded but docker images shows no ocular:... entry
docker load printed Loaded image: ... but the name is different than
expected.
Cause: tarball got corrupted during transfer (SHA would have caught this) or is an older format.
Fix: re-verify the SHA; re-download if needed. If SHA passes but
docker images is wrong, contact support with the output of
docker images and sha256sum of the tarball.
Operating the deployment
Logs
# All services, last N lines, follow:
sudo docker compose -f customer-compose.yml --profile console logs --tail=200 -f
# Just Ocular:
sudo docker compose -f customer-compose.yml logs --tail=100 -f ocular
# Just Console:
sudo docker compose -f customer-compose.yml --profile console logs --tail=100 -f consoleUseful greps:
# Ocular request log:
sudo docker compose -f customer-compose.yml logs ocular | grep -E 'POST /classify'
# Console ingest trace (one line per scored session received):
# Format: [ingest] s=<session_id> msgs=N virtual_turns=V trajectory=T attached=A mismatched=M out_of_bounds=B
# Healthy: mismatched=0, out_of_bounds=0, attached==T.
sudo docker compose -f customer-compose.yml --profile console logs console | grep '\[ingest\]'Data persistence
Console writes to /app/data/console.db (SQLite). The compose file mounts
this on a Docker named volume console-data — it survives container
restarts and upgrades.
To back up:
# Console should be stopped OR you should use the SQLite backup API to
# avoid partial-write corruption. Simplest safe backup:
sudo docker compose -f customer-compose.yml --profile console exec console \
sqlite3 /app/data/console.db ".backup /app/data/backup.db"
# Then copy the backup off-container:
sudo docker cp $(sudo docker compose -f customer-compose.yml ps -q console):/app/data/backup.db \
./console-backup-$(date +%Y%m%d).dbRestore by stopping Console, replacing /app/data/console.db with the
backup, and starting Console again. It'll run migrate() on startup; if
your backup is from an older schema it will be brought forward
automatically.
Retention
Console retention has two axes: time (how old) and size (how much). Both apply — whichever evicts first wins. By default only time is enabled.
Time retention
Old sessions are deleted automatically per the RETENTION_DAYS env var
(default 7). Sweep runs once per hour; deletion also triggers SQLite
PRAGMA incremental_vacuum, so the .db file actually shrinks instead of
just marking pages free.
To change at runtime without a restart (requires CONSOLE_MODE=full,
which the customer compose sets by default):
curl -X PATCH http://localhost:3950/api/settings \
-H 'Content-Type: application/json' \
-d '{"retentionDays": 30}'
# → {"ok": true}Range is 1-90 days; server returns 400 outside that range. The Settings
page (http://localhost:3950/settings) is a thin wrapper over the same
endpoint if you'd rather click.
Precedence:
console_config.retention_seconds(written by API PATCH / Settings UI) — wins once set.- Otherwise:
RETENTION_DAYSenv var (a bootstrap default for a fresh deploy). - Otherwise: the 7-day hardcoded default.
Size retention
When the .db + WAL file exceeds the configured cap, Console evicts the
oldest sessions (FIFO on scored_at) in batches until back under cap,
then runs wal_checkpoint(TRUNCATE) + incremental_vacuum so the file
actually shrinks on disk. A watchdog runs every 60 seconds (faster than
the hourly time sweep because a write burst can blow past the cap inside
one hourly tick).
Set via env (bootstrap):
# 50 GB cap. 0 or unset = unlimited (default).
RETENTION_MAX_GB=50Or at runtime:
curl -X PATCH http://localhost:3950/api/settings \
-H 'Content-Type: application/json' \
-d '{"retentionMaxGB": 50}'
# → {"ok": true}Settings page exposes the same knob with a dropdown.
Precedence mirrors time retention: DB row > env > unlimited. Same trade- off — the UI write sticks across restarts.
Eviction semantics: oldest sessions are evicted first, with their session_turns + session_code_occurrences + any audit/match/outbox rows older than the cutoff. Users who lose their last session are dropped. Dashboard caches are invalidated so the next page load reflects the eviction. The operator sees the eviction in logs:
[retention-size] Evicted 5000 sessions in 10 batches: 12.40 GB -> 8.21 GB (cap 10.00 GB, stopped=under-cap)
Sizing rule of thumb: typical session weight is 10–50 KB (dominated
by the full Ocular response JSON). At 200k sessions/minute and 7-day time
retention, plan for ~150 GB comfortable headroom. Set the size cap
below your volume's disk quota so Console ages out old data gracefully
instead of hitting disk full and losing writes.
Console at scale
Console is a single-process Node + SQLite appliance. It comfortably handles a few thousand sessions per second sustained on a properly-sized box (8 vCPU, 16 GB RAM, NVMe). The throughput knobs and limits are spelled out below so you can size correctly and recognise the right failure signal.
Endpoint shapes
| Endpoint | When to use | Throughput notes |
|---|---|---|
POST /api/ingest |
One session per request | Concurrent requests are coalesced into batched commits automatically. ~1k sessions/sec sustained per Console process. |
POST /api/ingest/batch |
High-volume producers that can buffer | Up to 500 pre-scored sessions in one body, single SQLite commit. Plausibly 5–10× the per-request endpoint on a beefy box. |
The single-session endpoint piles concurrent submissions into one fsync per event-loop tick, so you don't have to batch yourself to get most of the benefit. The batch endpoint exists for producers that already buffer and want one HTTP round-trip per N sessions.
The batch endpoint accepts pre-scored payloads only — every entry must
include the ocular field with the response from your earlier /classify
call. If you're scoring inline, use /api/ingest and let the coordinator
handle the amortization.
Backpressure and 429
When the in-flight queue exceeds INGEST_QUEUE_HIGH_WATERMARK (default
256) on /api/ingest, Console returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: application/json
{"error": "Console ingest queue is saturated (depth >= 256). Retry after 1s ..."}
This is the producer back-off signal. The right response is honoring
Retry-After and either slowing the per-second rate or switching to
/api/ingest/batch. Hammering past 429 just refills the queue and
prolongs the saturation; in the worst case it OOMs the Console process.
Tunables
All values come from environment variables on the Console container, with production-safe defaults. Override only with reason.
| Variable | Default | Effect |
|---|---|---|
INGEST_GROUP_COMMIT_MAX_BATCH |
64 |
Max single-ingest closures coalesced per fsync. Bigger = better throughput, longer writer hold. |
INGEST_QUEUE_HIGH_WATERMARK |
256 |
Pending-queue depth at which /api/ingest returns 429. Bigger = more buffering for spiky producers, more memory. |
INGEST_RETRY_AFTER_SECONDS |
1 |
Value sent in Retry-After on 429. |
INGEST_MAX_BATCH_SESSIONS |
500 |
Max sessions per /api/ingest/batch request. |
INGEST_MAX_MESSAGES |
2000 |
Max messages per single ingest payload. |
INGEST_MAX_MESSAGE_CHARS |
64000 |
Max chars per message body. |
INGEST_MAX_TRAJECTORY_POINTS |
500 |
Max trajectory entries per pre-scored payload. |
INGEST_MAX_ID_CHARS |
256 |
Max chars on session_id / user_id / agent_id. |
RETENTION_DAYS |
7 |
Bootstrap time retention; superseded by Settings UI / API once set. |
RETENTION_MAX_GB |
0 (unlimited) |
Bootstrap size retention; see "Size retention" below. |
Disk sizing for Console
The Console DB stores the full Ocular response per session plus per-turn
score dicts. Typical row weight is 10–50 KB. At 200k sessions/minute and
7-day retention, plan for ~150 GB SQLite + WAL with comfortable headroom.
Halve retention to halve the budget; the WAL cap is held to 64 MB so the
database file is the dominant term. Set RETENTION_MAX_GB as a hard stop
below your volume's disk quota — Console will evict the oldest sessions
FIFO when it hits the cap rather than let the writer hit disk-full.
When to call us
A single Console process can't scale linearly forever. If you're sustaining above ~5k sessions/sec, or your dashboard analytics queries (Sessions list, Agents page) become noticeably slow, contact NOPE — at that point the right move is a different topology, not more knobs on this one.
Diagnostics export
Console can package its persisted Ocular state as an NDJSON snapshot and
hand it to NOPE for analysis (signal correlations, calibration drift,
traffic mix). The full write-up — what's in, what's deliberately left
out, the time-window picker, privacy considerations — lives in its own
doc at diagnostics.md and is reachable in Console at
http://<console-host>:3950/diagnostics.
Quick reference:
- Settings → Diagnostics Export → Open → opens the full page.
- HTTP:
GET /api/diagnostics/export(requiresCONSOLE_MODE=full).?estimate=truefor pre-flight row counts + byte estimate.?since_days=Nto scope to the last N days.
The file is gzip-compressed over the wire when the client sends
Accept-Encoding: gzip (every modern browser does; pass --compressed
to curl). Typical compression ratio is 10–20×.
Updating the deployment
When NOPE ships a new version:
# Load the new images (old images stay until removed):
zstd -d -c ocular-platform-<new-version>.tar.zst | sudo docker load
# Update .env:
sed -i "s/OCULAR_VERSION=.*/OCULAR_VERSION=<new-version>/" .env
# Restart. Console migration runs automatically on first boot of the new
# image; database survives.
sudo docker compose -f customer-compose.yml --profile console up -d
# (Optional) remove old images to reclaim disk:
sudo docker image prune # removes dangling; be careful if you have other imagesIf you're running at production-grade uptime, do a blue-green instead: bring up a second Ocular container on a different port, cut traffic over at your load balancer, then retire the old one. Console can be upgraded in-place since it's not on the hot path.
What to do next
- Integration patterns — how to wire Ocular into your app (inline,
async, tee/sample, watchlist-driven, fleet) is covered in
integration-patterns.mdon the customer portal. - Watchlists + webhooks — if you started Console (with
--profile console), its dashboard atlocalhost:3950has a sandbox for iterating on watchlist rules. Each rule can fire a webhook to your receiver. - Production hardening — reverse proxy, TLS termination, auth layer, log shipping, backup automation. These are your call; the underlying Docker deployment is a building block, not a complete production product.
Support
For issues the above doesn't resolve: email [email protected] (or the
dedicated contact your NOPE rep gave you, if your tier includes one). When
you report, please include:
sudo docker compose -f customer-compose.yml --profile console ps- Last 100 lines of Ocular and Console logs
nvidia-smioutput (on the host)- Your
OCULAR_VERSION(from.env, ordocker images | grep ocular:) - Console's schema version:
sudo docker compose -f customer-compose.yml --profile console exec console sqlite3 /app/data/console.db "SELECT value FROM console_config WHERE key='schema_version'"