Deploying Ocular on-prem

This document walks you from "I received the tarball and a license token" to "the service is running and I've made my first /classify call." Everything in it has been executed end-to-end on a clean Ubuntu 24.04 host with an Ampere-class or newer NVIDIA datacenter GPU. If a step doesn't work for you, the Troubleshooting section covers every failure mode we've actually seen.

For higher-level context on what Ocular is and isn't, see the other docs in your customer portal — risk-interpretation.md, api-reference.md, and integration-patterns.md in particular.

What you received

From NOPE:

ocular-platform-<version>.tar.zst — one compressed Docker tarball containing both images (ocular:<version> and ocular-console:<version>). Typical size ~8-9 GB. Model weights are baked into the image — no HuggingFace pull, no external dependency at run time.
ocular-platform-<version>.tar.zst.sha256 — SHA256 sidecar for verification.
A signed license token — the value for OCULAR_LICENSE_KEY. Format OCULAR-v1.<base64-payload>.<base64-signature>, ECDSA P-256, issued with a tier (pilot / production / enterprise) and an expiry.
customer-compose.yml (included in this repo at docker/customer-compose.yml) — the one-file Docker Compose spec that starts everything.
customer.env.example (in docker/customer.env.example) — env template.

The Ocular image contains:

A language model with merged adapter weights, packaged for inference.
The trained behavioral-probe heads and calibration data.
The Python service (FastAPI) exposing /classify, /health, /manifest.
License-enforcement cryptography (ECDSA P-256 public key baked in).

The Console image contains:

A SvelteKit UI + SQLite-backed store.
Ingest, watchlist, webhook, and session-detail endpoints.
No GPU dependency, no model.

No outbound network is required for operation once the image is loaded. Console's webhook outbox will call out to your configured receivers only if you set webhooks up — otherwise Ocular and Console are entirely network-contained.

Before you start

Hardware

The supported floor depends on which mode you call. Single-pass (per_turn=false) classifies one assessment per conversation and is GPU-light. Trajectory (per_turn=true) emits a score at every speaker boundary and is GPU-memory-bound on long inputs.

Resource	Single-pass only (`per_turn=false`)	Trajectory mode (`per_turn=true`)
GPU	8 GB+ VRAM, Ampere or newer datacenter-class	24 GB+ VRAM, Ampere or newer datacenter-class
CPU	4 cores	4 cores (8+ for sustained concurrent ingest)
RAM	16 GB	16 GB (32 GB+ for sustained concurrent ingest)
Disk	30 GB free (image, tarball, Console DB, working space)	30 GB free; 100 GB+ SSD if retaining long histories
Network	None required at runtime; outbound only if using webhooks	(same)

Trajectory mode peaks around 14 GiB transient working memory beyond the resident model on long conversations (~200+ turns). Cards smaller than 24 GiB VRAM may run out of memory under sustained concurrent trajectory load on long inputs; single-pass mode is unaffected at lower specs.

On a 24 GB datacenter-class GPU (A10G reference), cold boot is ~20 s, /classify p50 is ~25 ms, and trajectory (8-message conversation, stride=3) p50 is ~42 ms. Larger cards (A100, H100, L40) perform proportionally better on batched workloads. Pre-Ampere GPUs are not supported.

Host OS

Linux (x86_64). Tested on Ubuntu 22.04 and 24.04 LTS.
Other distros work if they can run NVIDIA Container Toolkit.
Docker on Mac and Docker Desktop on Windows are not supported — GPU passthrough isn't compatible with Docker Desktop's VM.
Bare metal or full-virtualization VMs only. Container-in-container environments (Vast.ai, RunPod, other OCI-pod platforms) block mount and unshare capabilities and the Ocular container will fail to start with operation not permitted. If you don't have bare metal, use a VM with root-level Docker (AWS EC2, GCE, Hetzner Dedicated, Vultr GPU, DigitalOcean GPU Droplets).

Software prerequisites

Install in order:

# Docker Engine + Compose v2 (from Docker's official repo)
curl -fsSL https://get.docker.com | sh
sudo systemctl enable --now docker

# If you want to run docker without `sudo`, add your user to the `docker`
# group. This requires a fresh login session to take effect — running
# `usermod -aG` and then `docker load` in the same shell will fail with
# "permission denied on the Docker daemon socket" because the group
# membership didn't propagate into the current process. Either log out
# and back in, or use `sg docker -c "<command>"` to run one-offs with
# the group active. The rest of this doc uses `sudo docker` throughout
# to sidestep this entirely; drop the `sudo` if your shell already has
# docker-group membership.
# sudo usermod -aG docker $USER  # then re-login

# NVIDIA Container Toolkit (the `--gpus all` plumbing)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -fsSL https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Utilities
sudo apt-get install -y zstd curl ca-certificates

Sanity check:

# On host — shows your GPU
nvidia-smi

# Inside a container — proves the runtime wiring
sudo docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

If either command doesn't return your GPU, stop here and fix the stack before proceeding. Ocular will not start without GPU access.

Step 1 — Transfer and verify the tarball

Get the platform artifacts and the customer bundle onto the host. Two common paths:

Option A — download directly on the host from the customer portal (fastest, no laptop round-trip; requires outbound HTTPS from the box):

# Replace <version> with whatever NOPE shipped you. Use curl -O or
# wget so the file is saved with its canonical name — this matters
# because `sha256sum -c` reads the filename from the sidecar and
# compares against the local filename verbatim; saving as anything
# else fails the check.
mkdir -p /opt/ocular && cd /opt/ocular
curl -fsSL --retry 5 -C - -O '<portal-platform-tarball-download-url>'
curl -fsSL --retry 5 -C - -O '<portal-platform-sha-download-url>'
curl -fsSL --retry 5 -C - -O '<portal-customer-bundle-download-url>'
tar -xzf customer-bundle-<version>.tar.gz
cd customer-bundle-<version>

Option B — scp from your laptop (if your host has no outbound network, or you've already pulled the artifacts locally):

scp ocular-platform-<version>.tar.zst               root@your-box:/opt/ocular/
scp ocular-platform-<version>.tar.zst.sha256        root@your-box:/opt/ocular/
scp customer-bundle-<version>.tar.gz                root@your-box:/opt/ocular/
# Then on the host:
cd /opt/ocular && tar -xzf customer-bundle-<version>.tar.gz
cd customer-bundle-<version>

The customer bundle unpacks into a subdirectory named customer-bundle-<version>/ — every subsequent command in this doc assumes you've cd'd into that directory (the compose file lives there, the env template lives there, this very doc is there).

Verify the SHA256 before loading:

# From /opt/ocular (where the tarball + sidecar sit):
sha256sum -c ocular-platform-<version>.tar.zst.sha256
# → ocular-platform-<version>.tar.zst: OK

If verification fails, do not proceed. The image is not safe to load. Re-download and recheck; contact support if the problem persists.

Load both images into Docker:

zstd -d -c /opt/ocular/ocular-platform-<version>.tar.zst | sudo docker load
# → Loaded image: ocular:<version>
# → Loaded image: ocular-console:<version>

This takes ~2-3 minutes on modern hardware; most of it is disk write.

Confirm the images are present:

sudo docker images | grep -E 'ocular(-console)?:<version>'

The tarball can be deleted after load if disk is tight:

rm ocular-platform-<version>.tar.zst

Step 2 — Configure environment

Copy the env template and fill in the required values:

cp customer.env.example .env
chmod 600 .env   # contains secrets

The rename is two conventions stacked: Docker Compose auto-loads a file literally named .env (no --env-file flag needed for the bare docker compose -f customer-compose.yml ... invocations used throughout this doc), and the .example suffix is the standard marker for "template, safe to commit" — the live .env should be gitignored and chmod 600, since it'll hold your license token.

Edit .env:

# Must match the tag baked into the images you loaded.
# Get this from `docker images` output or the filename of the tarball.
OCULAR_VERSION=<version>

# Your signed license token. Required — Ocular refuses to start without it.
# Store this only in .env, never commit to git, rotate when NOPE re-issues.
OCULAR_LICENSE_KEY=OCULAR-v1.<base64-payload>.<base64-signature>

# Optional: Console session retention (default 7 days).
RETENTION_DAYS=7

# Optional: upstream transcript provider (leave unset if you don't have one).
# PROVIDER_URL=http://your-app.internal:8000

On the license token: keep it in .env (mode 600). Do not bake it into the image, commit it to source, or put it in a Docker build ARG. If it leaks you need to contact NOPE to rotate.

Step 3 — Decide your deployment mode

customer-compose.yml supports three usage modes, selected by whether or not you pass --profile console to docker compose.

Ocular only, no Console

Your app calls Ocular /classify directly. Console never starts. No webhooks, no operator UI. Simplest deployment; lowest resource footprint.

sudo docker compose -f customer-compose.yml up -d

Use this if:

You already have a monitoring/alerting pipeline and just need scoring.
You want to minimize attack surface and operational surface.
You're piloting the scoring API before deciding on Console.

Ocular + Console (operational)

Full stack. Console receives scored sessions via a conduit push from Ocular, evaluates your configured watchlists, and can fire webhooks. Includes an operator UI.

sudo docker compose -f customer-compose.yml --profile console up -d

Use this if:

You want session-level history + UI for operators/reviewers.
You want configurable watchlist rules with webhook delivery.
You need to iterate on detection rules before wiring your own pipeline.

CONSOLE_MODE=full in customer-compose.yml enables Console's mutating endpoints (creating/editing watchlists, changing retention, CSV export). The Console image defaults to readonly if the var is absent — that's a fail-closed default. The compose file sets full explicitly so an operational deploy has the full UI out of the box.

Design in Console, export to your own pipeline

Same compose invocation as the operational mode — the difference is intent: you run Console until your watchlist rules are stable, then export them via GET /api/watchlists?format=export and evaluate them in your own rules engine. Console can then be stopped or left running for session inspection.

Console durability characteristics to factor in: SQLite-backed storage with 7-day default retention (RETENTION_DAYS). Webhook delivery is an in-process outbox with at-least-once semantics and exponential backoff up to ~1 hour before giving up. Console availability doesn't affect Ocular's scoring — a Console outage is visibility-only.

Step 4 — Start and watch it warm up

sudo docker compose -f customer-compose.yml --profile console up -d

Ocular takes 25-60 seconds to reach healthy on a warm host, longer on a cold one. The compose file sets start_period: 120s so Docker doesn't mark it unhealthy prematurely. Don't panic if it shows starting for a minute; that's expected.

During this window, the logs progress through (in order):

License validation — line begins License valid: with org and tier.
CUDA platform detection.
Model loading from /models/ocular.
Checkpoint shard load.
Head load + calibration data from /models/heads.
Warmup pass.
Service ready — Uvicorn running on http://0.0.0.0:8080.

If license validation fails (step 1), see Troubleshooting — license errors.

If you used --profile console, Console starts only after Ocular is healthy. This is intentional: Console's dependency on Ocular means starting it first would produce confusing "upstream unreachable" errors during the warmup window. You'll see Console as starting until Ocular turns green.

Watch progress:

# Combined log tail (both services)
sudo docker compose -f customer-compose.yml --profile console logs -f

# Just Ocular:
sudo docker logs -f $(sudo docker compose -f customer-compose.yml ps -q ocular)

# Health status at a glance:
sudo docker compose -f customer-compose.yml ps

Step 5 — Verify it works

Shallow health

The stock customer-compose.yml does not publish container ports to the host by default (the ports: lines are commented out so that most customers running their app in the same compose project can reach Ocular via its internal DNS name with nothing exposed externally). That means curl http://localhost:8080/... from the host won't connect out of the box.

Two equally-good ways to hit the service:

Option A — run curl inside the container (works regardless of whether ports are published):

sudo docker compose -f customer-compose.yml exec ocular \
  curl -fsS http://localhost:8080/health | jq .

Option B — publish ports to the host (if you're running the sanity check from your laptop, or want to integrate with a non-compose app). Uncomment the ports: block under ocular in customer-compose.yml, restart, then:

curl -fsS http://localhost:8080/health | jq .

The README.md in this bundle also covers a compose-override pattern (ports-override.yml) for keeping the default compose file pristine while still publishing during testing.

Expected output (either way):

{
  "status": "ok",
  "mode": "local",
  "version": "<version>",
  "queue_depth": 0
}

If you get {"status": "starting", ...}, Ocular is still warming up. Wait and retry.

First `/classify` call

sudo docker compose -f customer-compose.yml exec ocular \
  curl -fsS -X POST http://localhost:8080/classify \
    -H 'Content-Type: application/json' \
    -d '{"messages":[{"role":"user","content":"I feel hopeless and do not want to be here"}]}' \
  | jq '{verdict, subject, risks: .risks.suicide, imminence, meta}'

Expected output (shape — numbers may differ):

{
  "verdict": "danger",
  "subject": "self",
  "risks": { "level": "critical", "score": 0.85 },
  "imminence": { "level": "critical", "score": 0.62 },
  "meta": {
    "version": "<version>",
    "inference_ms": 28,
    "windowed": false,
    "windows": 1,
    "request_id": "abc12345"
  }
}

If you see "verdict": "clear" on obvious-crisis input like the above, the scoring pipeline probably isn't working correctly. Check the logs for errors (docker compose -f customer-compose.yml logs ocular); verify OCULAR_VERSION + the loaded image tag match; if the problem persists after a clean reload, contact support with the request_id from the response headers.

(The verdict floor varies slightly across model versions and input phrasing — "watch" instead of "danger" on this specific string isn't necessarily a bug. What's definitely wrong is "clear" with risks.suicide.score near zero on obvious-crisis text.)

For the full response shape (all 8 risk axes + 4 AI-concern axes + signals[] + stability diagnostics), drop the | jq filter; the raw JSON is documented in api-reference.md.

Sanity check: benign inputs stay low

Same two options as above — inside the container via docker compose exec, or from the host if you published ports. Showing the exec form since it works either way:

sudo docker compose -f customer-compose.yml exec ocular \
  curl -s -X POST http://localhost:8080/classify \
    -H 'Content-Type: application/json' \
    -d '{"text":"User: Hello.","detail":true}' \
  | jq '{verdict, suicide: .risks.suicide, top_scores: (.detail.scores | to_entries | sort_by(-.value) | .[0:3])}'

Expected shape (numbers may vary slightly on your hardware):

{
  "verdict": "clear",
  "suicide": {"level": "minimal", "score": 0.02},
  "top_scores": [
    { "key": "signal_XXXX", "value": 0.24 },
    { "key": "signal_YYYY", "value": 0.12 },
    { "key": "signal_ZZZZ", "value": 0.08 }
  ]
}

What you want to see: verdict is clear, risks.suicide.score well below 0.1, top scores trailing off smoothly, no head pegged at 1.0. A bare greeting is a degenerate input, so small non-zero scores are normal — but broadly saturated scores (multiple heads at 1.0) indicate a build problem. If you see that, contact support; do not ship the deployment to production.

A benign conversational input should score even lower. Try "User: can you help me plan a weekend trip to Edinburgh?" — risks.suicide.score should be essentially zero (< 0.02).

Console (if running)

As with Ocular, Console's port 3950 is expose: only by default. Two paths:

# Inside the container (works by default, no ports config needed):
sudo docker compose -f customer-compose.yml --profile console exec console \
  curl -fsS http://localhost:3950/api/health | jq .
# → {"status":"ok","sessions":0}

# Deep check — verifies Console can reach Ocular too:
sudo docker compose -f customer-compose.yml --profile console exec console \
  curl -fsS 'http://localhost:3950/api/health?deep=true' | jq .
# → {"status":"ok","db":"ok","ocular":"ok","sessions":0}

Or uncomment the ports: block under console in customer-compose.yml and curl http://localhost:3950/... from the host directly.

If deep health returns "ocular": "unreachable", Console can't talk to Ocular. Check that both are on the same Docker network (docker compose -f customer-compose.yml ps should show both as running) and that OCULAR_URL in the compose file points at http://ocular:8080/classify.

End-to-end sanity check (Ocular → Console conduit push → Console session store)

Two request shapes, one gotcha. /classify accepts both text: (a "User: ...\n\nAssistant: ..." blob) and messages: [{role, content}]. Both score equivalently. Only messages: lets Console populate the per-turn transcript when you pass log: true — text: gives Console a scored session with an empty turns[]. Failure is silent (200 OK, session row stored, transcript just empty). Use messages: for anything logged to Console. See integration-patterns.md for the full trade-off.

# Make a scored request with log=true + session_id + user_id — Ocular
# will fire-and-forget push it to Console, which stores it.
SESSION_ID="sanity-$(date +%s)"
sudo docker compose -f customer-compose.yml exec ocular \
  curl -s -X POST http://localhost:8080/classify \
    -H 'Content-Type: application/json' \
    -d "{
      \"messages\":[{\"role\":\"user\",\"content\":\"I feel hopeless and do not want to be here\"}],
      \"session_id\":\"$SESSION_ID\",
      \"user_id\":\"sanity-user\",
      \"log\":true
    }" > /dev/null

# After 1-2 seconds, Console should have the session. The response shape
# is { session_id, user_id, ..., ocular: <verbatim /classify body>, turns: [...] }
# — see console.md and api-reference.md for the full schemas.
sleep 2
sudo docker compose -f customer-compose.yml --profile console exec console \
  curl -s "http://localhost:3950/api/sessions/$SESSION_ID" \
  | jq '{session_id, verdict: .ocular.verdict, suicide: .ocular.risks.suicide.score, message_count}'
# → { "session_id": "sanity-...", "verdict": "danger", "suicide": 0.6, "message_count": 1 }

If the session doesn't appear in Console:

Confirm Ocular's env has OCULAR_CONSOLE_URL=http://console:3950/api/ingest (the compose file does this by default — don't override it unless you have a reason).
Confirm you passed "log": true. session_id / user_id alone do not trigger a push — logging is explicitly opt-in.
Check Ocular logs for Conduit push failed: entries.

If this end-to-end flow succeeds, your deployment is working.

Step 6 — Expose to your application

By default the compose file only exposes Ocular and Console on the Docker internal network. To let your application reach them:

Same Docker network as Ocular

The bundle's customer-compose.yml declares its default network with the fixed name ocular-platform. If your app runs in Docker Compose too, join that network as external and the services ocular and console become reachable by name on ports 8080 and 3950:

# your-app's compose.yml
services:
  your-app:
    # ... your service definition ...
    networks:
      - default
      - ocular-platform

networks:
  ocular-platform:
    external: true   # declared by the Ocular bundle; we're joining it

Host-published ports

Two equivalent recipes — either edit the compose file in place or drop a small override file beside it. The override-file pattern (option B) is preferred if you want to keep customer-compose.yml pristine for easier diffing against future release bundles.

Option A — edit customer-compose.yml in place. Uncomment the ports: section under each service:

services:
  ocular:
    # ...
    ports:
      - "127.0.0.1:8080:8080"   # bind to localhost only; put a reverse proxy in front
  console:
    # ...
    ports:
      - "127.0.0.1:3950:3950"

Option B — override file. Create ports-override.yml next to the compose file:

# ports-override.yml
services:
  ocular:
    ports: ["127.0.0.1:8080:8080"]
  console:
    ports: ["127.0.0.1:3950:3950"]

Then include it in every docker compose invocation:

sudo docker compose -f customer-compose.yml -f ports-override.yml --profile console up -d

Do not bind to 0.0.0.0 on a network-exposed host. Ocular has no authentication on /classify — anyone who reaches port 8080 can use your license quota. The standard deployment pattern is: bind to 127.0.0.1, put your reverse proxy (nginx, Caddy, Traefik, or your cloud's load balancer) in front, handle TLS + auth at the proxy layer.

VPC / private network

On AWS/GCP/Azure: leave ports: commented (internal only) and give the application and Ocular the same VPC + security group / firewall rule. Your app resolves the Docker host's private IP and hits port 8080 there.

Step 7 — Exposing Console (optional)

Console runs on port 3950. Published to the public internet it would be unauthenticated (Console has no built-in auth layer — same trust model as Ocular). Typical exposure patterns:

SSH tunnel (simplest, single-user)

ssh -L 3950:localhost:3950 your-ocular-host
# then open http://localhost:3950 in your browser

No infrastructure to set up. Fits occasional single-user access.

Reverse proxy behind your SSO

If you already run nginx / Caddy / Traefik with OIDC or SAML in front of internal tooling, add Console as another upstream:

console.internal.your-company.com  →  http://ocular-host:3950

Gate it with your existing SSO. This is the right pattern for teams that already have an internal-tooling auth pattern.

Cloudflare Tunnel + Cloudflare Access (no public IP required)

If you use Cloudflare as your DNS provider and want browser access to Console from anywhere without opening port 3950 to the internet:

Install cloudflared on the Ocular host:

curl -sSL -o /tmp/cloudflared.deb \
  https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i /tmp/cloudflared.deb

In the Cloudflare dashboard: Zero Trust → Networks → Tunnels → Create a tunnel. Copy the tunnel token.
Install cloudflared as a systemd service with the token:
```
sudo cloudflared service install <tunnel-token>
```
In the tunnel's Public Hostnames tab, add:
- Subdomain: console (or similar)
- Domain: your Cloudflare-managed domain
- Service: HTTP → localhost:3950
In Zero Trust → Access → Applications, add a Self-hosted app for the hostname above. Set a policy (e.g. "Emails ending in @your-company.com") and 24 h session duration.

The tunnel opens an outbound QUIC connection from your host to Cloudflare — no inbound ports on your host are required. Unauthenticated requests are redirected to Cloudflare Access login before reaching Console. Users visit console.your-company.com in a browser, sign in via the configured identity provider, and land on Console. Survives host restarts via systemd. Verify gating with curl -sI https://console.your-company.com/ — an unauthenticated request should return a 302 to cloudflareaccess.com/cdn-cgi/access/login/....

Troubleshooting

`nvidia-smi` works on host but not in Docker

docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi returns "could not select device driver" or similar.

Cause: NVIDIA Container Toolkit isn't configured as a Docker runtime.

Fix:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

`docker load` fails with "no space left on device"

Something like:

Error processing tar file(exit status 1): write /.../layer.tar: no space left on device

You don't have enough disk for the 25 GB working set (image + tarball + Docker overlay fs).

Fix: free at least 40 GB or mount /var/lib/docker on a larger disk.

Ocular exits immediately with `LicenseError`

Possible messages (all raised by lib/license.py):

OCULAR_LICENSE_KEY not set. Provide a valid license token via environment variable. — you didn't populate .env, or the var isn't being read.
Invalid token format: expected 3 dot-separated parts — the token is truncated or not in OCULAR-v1.payload.signature form.
Invalid token prefix: expected 'OCULAR-v1', got '<prefix>' — wrong token or accidentally pasted a different format.
Signature verification failed: <detail> — the token is malformed or tampered with, or the public key baked into your image doesn't match the private key NOPE used to sign the token.
License expired N day(s) ago (grace period ended) — you're past the 72-hour grace period after the expiry timestamp. Contact NOPE for renewal.

If your token is within 14 days of expiry, Ocular still starts but logs License valid: ... status=expiring. If it's past exp but within the 72-hour grace window, Ocular starts with status=grace and logs remaining hours. Only after the grace period is startup refused.

Only the token's expiry (exp) is enforced at runtime. Volume-related contract terms (scoring volume, user count, concurrent containers, etc.) are not encoded in the token and not checked by the running container — they're audited contractually.

Verify the env var is visible to the container:

sudo docker compose -f customer-compose.yml config | grep -i license

If it's empty, check that .env is in the same directory as customer-compose.yml and contains OCULAR_LICENSE_KEY=OCULAR-v1....

Ocular logs reference a missing build toolchain

These are bugs in older images that were fixed before release. If you see them, you have an older image.

Fix: confirm OCULAR_VERSION in .env matches the <version> that NOPE shipped in the tarball filename, and that the tarball is the current release. Run docker images | grep ocular: — the tag there should match what NOPE said to expect. If you don't know the current version, contact support.

Ocular `/health` returns 503 "Model still loading"

Cause: normal. The warmup takes 25-60 seconds on a warm host.

Fix: wait. If it's been more than 3 minutes, check logs for errors:

sudo docker logs $(sudo docker compose -f customer-compose.yml ps -q ocular) \
  | tail -50

Console shows `"ocular": "unreachable"` in deep health

Cause: Console can't resolve or connect to the Ocular service.

Fix:

Run sudo docker compose -f customer-compose.yml ps — both services should say running.
Confirm Console's env has OCULAR_URL=http://ocular:8080/classify (compose default).
Test connectivity: sudo docker compose -f customer-compose.yml --profile console exec console curl -fsS http://ocular:8080/health — should return Ocular's health JSON.

Compose says Console is unhealthy but `curl localhost:3950/api/health` works fine

Cause: the healthcheck in an older Console image uses wget which isn't present; the container is actually healthy, Docker just can't probe it.

Fix: make sure you're on a current Console image. The current image uses curl for the healthcheck and has it installed.

`/classify` is slow (>500ms for single-message input)

Check:

GPU isn't idle: nvidia-smi should show 15-90% util during a request burst. If GPU util is near 0, the request is routing to CPU — check docker compose -f customer-compose.yml exec ocular python3 -c "import torch; print(torch.cuda.is_available())". Should print True.
VRAM isn't paging: nvidia-smi should show ~4-5 GB used. If it's near the max, you've hit memory pressure — reduce input size or upgrade GPU.
You're not hitting a warmup: the first request after idle takes 1-2s for kernel dispatch caches to warm. Steady-state is ~25-50ms.

Tarball loaded but `docker images` shows no `ocular:...` entry

docker load printed Loaded image: ... but the name is different than expected.

Cause: tarball got corrupted during transfer (SHA would have caught this) or is an older format.

Fix: re-verify the SHA; re-download if needed. If SHA passes but docker images is wrong, contact support with the output of docker images and sha256sum of the tarball.

Operating the deployment

Logs

# All services, last N lines, follow:
sudo docker compose -f customer-compose.yml --profile console logs --tail=200 -f

# Just Ocular:
sudo docker compose -f customer-compose.yml logs --tail=100 -f ocular

# Just Console:
sudo docker compose -f customer-compose.yml --profile console logs --tail=100 -f console

Useful greps:

# Ocular request log:
sudo docker compose -f customer-compose.yml logs ocular | grep -E 'POST /classify'

# Console ingest trace (one line per scored session received):
# Format: [ingest] s=<session_id> msgs=N virtual_turns=V trajectory=T attached=A mismatched=M out_of_bounds=B
# Healthy: mismatched=0, out_of_bounds=0, attached==T.
sudo docker compose -f customer-compose.yml --profile console logs console | grep '\[ingest\]'

Data persistence

Console writes to /app/data/console.db (SQLite). The compose file mounts this on a Docker named volume console-data — it survives container restarts and upgrades.

To back up:

# Console should be stopped OR you should use the SQLite backup API to
# avoid partial-write corruption. Simplest safe backup:
sudo docker compose -f customer-compose.yml --profile console exec console \
  sqlite3 /app/data/console.db ".backup /app/data/backup.db"

# Then copy the backup off-container:
sudo docker cp $(sudo docker compose -f customer-compose.yml ps -q console):/app/data/backup.db \
  ./console-backup-$(date +%Y%m%d).db

Restore by stopping Console, replacing /app/data/console.db with the backup, and starting Console again. It'll run migrate() on startup; if your backup is from an older schema it will be brought forward automatically.

Retention

Console retention has two axes: time (how old) and size (how much). Both apply — whichever evicts first wins. By default only time is enabled.

Time retention

Old sessions are deleted automatically per the RETENTION_DAYS env var (default 7). Sweep runs once per hour; deletion also triggers SQLite PRAGMA incremental_vacuum, so the .db file actually shrinks instead of just marking pages free.

To change at runtime without a restart (requires CONSOLE_MODE=full, which the customer compose sets by default):

curl -X PATCH http://localhost:3950/api/settings \
  -H 'Content-Type: application/json' \
  -d '{"retentionDays": 30}'
# → {"ok": true}

Range is 1-90 days; server returns 400 outside that range. The Settings page (http://localhost:3950/settings) is a thin wrapper over the same endpoint if you'd rather click.

Precedence:

console_config.retention_seconds (written by API PATCH / Settings UI) — wins once set.
Otherwise: RETENTION_DAYS env var (a bootstrap default for a fresh deploy).
Otherwise: the 7-day hardcoded default.

Size retention

When the .db + WAL file exceeds the configured cap, Console evicts the oldest sessions (FIFO on scored_at) in batches until back under cap, then runs wal_checkpoint(TRUNCATE) + incremental_vacuum so the file actually shrinks on disk. A watchdog runs every 60 seconds (faster than the hourly time sweep because a write burst can blow past the cap inside one hourly tick).

Set via env (bootstrap):

# 50 GB cap. 0 or unset = unlimited (default).
RETENTION_MAX_GB=50

Or at runtime:

curl -X PATCH http://localhost:3950/api/settings \
  -H 'Content-Type: application/json' \
  -d '{"retentionMaxGB": 50}'
# → {"ok": true}

Settings page exposes the same knob with a dropdown.

Precedence mirrors time retention: DB row > env > unlimited. Same trade- off — the UI write sticks across restarts.

Eviction semantics: oldest sessions are evicted first, with their session_turns + session_code_occurrences + any audit/match/outbox rows older than the cutoff. Users who lose their last session are dropped. Dashboard caches are invalidated so the next page load reflects the eviction. The operator sees the eviction in logs:

[retention-size] Evicted 5000 sessions in 10 batches: 12.40 GB -> 8.21 GB (cap 10.00 GB, stopped=under-cap)

Sizing rule of thumb: typical session weight is 10–50 KB (dominated by the full Ocular response JSON). At 200k sessions/minute and 7-day time retention, plan for ~150 GB comfortable headroom. Set the size cap below your volume's disk quota so Console ages out old data gracefully instead of hitting disk full and losing writes.

Console at scale

Console is a single-process Node + SQLite appliance. It comfortably handles a few thousand sessions per second sustained on a properly-sized box (8 vCPU, 16 GB RAM, NVMe). The throughput knobs and limits are spelled out below so you can size correctly and recognise the right failure signal.

Endpoint shapes

Endpoint	When to use	Throughput notes
`POST /api/ingest`	One session per request	Concurrent requests are coalesced into batched commits automatically. ~1k sessions/sec sustained per Console process.
`POST /api/ingest/batch`	High-volume producers that can buffer	Up to 500 pre-scored sessions in one body, single SQLite commit. Plausibly 5–10× the per-request endpoint on a beefy box.

The single-session endpoint piles concurrent submissions into one fsync per event-loop tick, so you don't have to batch yourself to get most of the benefit. The batch endpoint exists for producers that already buffer and want one HTTP round-trip per N sessions.

The batch endpoint accepts pre-scored payloads only — every entry must include the ocular field with the response from your earlier /classify call. If you're scoring inline, use /api/ingest and let the coordinator handle the amortization.

Backpressure and 429

When the in-flight queue exceeds INGEST_QUEUE_HIGH_WATERMARK (default 256) on /api/ingest, Console returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: application/json

{"error": "Console ingest queue is saturated (depth >= 256). Retry after 1s ..."}

This is the producer back-off signal. The right response is honoring Retry-After and either slowing the per-second rate or switching to /api/ingest/batch. Hammering past 429 just refills the queue and prolongs the saturation; in the worst case it OOMs the Console process.

Tunables

All values come from environment variables on the Console container, with production-safe defaults. Override only with reason.

Variable	Default	Effect
`INGEST_GROUP_COMMIT_MAX_BATCH`	`64`	Max single-ingest closures coalesced per fsync. Bigger = better throughput, longer writer hold.
`INGEST_QUEUE_HIGH_WATERMARK`	`256`	Pending-queue depth at which `/api/ingest` returns 429. Bigger = more buffering for spiky producers, more memory.
`INGEST_RETRY_AFTER_SECONDS`	`1`	Value sent in `Retry-After` on 429.
`INGEST_MAX_BATCH_SESSIONS`	`500`	Max sessions per `/api/ingest/batch` request.
`INGEST_MAX_MESSAGES`	`2000`	Max messages per single ingest payload.
`INGEST_MAX_MESSAGE_CHARS`	`64000`	Max chars per message body.
`INGEST_MAX_TRAJECTORY_POINTS`	`500`	Max trajectory entries per pre-scored payload.
`INGEST_MAX_ID_CHARS`	`256`	Max chars on `session_id` / `user_id` / `agent_id`.
`RETENTION_DAYS`	`7`	Bootstrap time retention; superseded by Settings UI / API once set.
`RETENTION_MAX_GB`	`0` (unlimited)	Bootstrap size retention; see "Size retention" below.

Disk sizing for Console

The Console DB stores the full Ocular response per session plus per-turn score dicts. Typical row weight is 10–50 KB. At 200k sessions/minute and 7-day retention, plan for ~150 GB SQLite + WAL with comfortable headroom. Halve retention to halve the budget; the WAL cap is held to 64 MB so the database file is the dominant term. Set RETENTION_MAX_GB as a hard stop below your volume's disk quota — Console will evict the oldest sessions FIFO when it hits the cap rather than let the writer hit disk-full.

When to call us

A single Console process can't scale linearly forever. If you're sustaining above ~5k sessions/sec, or your dashboard analytics queries (Sessions list, Agents page) become noticeably slow, contact NOPE — at that point the right move is a different topology, not more knobs on this one.

Diagnostics export

Console can package its persisted Ocular state as an NDJSON snapshot and hand it to NOPE for analysis (signal correlations, calibration drift, traffic mix). The full write-up — what's in, what's deliberately left out, the time-window picker, privacy considerations — lives in its own doc at diagnostics.md and is reachable in Console at http://<console-host>:3950/diagnostics.

Quick reference:

Settings → Diagnostics Export → Open → opens the full page.
HTTP: GET /api/diagnostics/export (requires CONSOLE_MODE=full). ?estimate=true for pre-flight row counts + byte estimate. ?since_days=N to scope to the last N days.

The file is gzip-compressed over the wire when the client sends Accept-Encoding: gzip (every modern browser does; pass --compressed to curl). Typical compression ratio is 10–20×.

Updating the deployment

When NOPE ships a new version:

# Load the new images (old images stay until removed):
zstd -d -c ocular-platform-<new-version>.tar.zst | sudo docker load

# Update .env:
sed -i "s/OCULAR_VERSION=.*/OCULAR_VERSION=<new-version>/" .env

# Restart. Console migration runs automatically on first boot of the new
# image; database survives.
sudo docker compose -f customer-compose.yml --profile console up -d

# (Optional) remove old images to reclaim disk:
sudo docker image prune  # removes dangling; be careful if you have other images

If you're running at production-grade uptime, do a blue-green instead: bring up a second Ocular container on a different port, cut traffic over at your load balancer, then retire the old one. Console can be upgraded in-place since it's not on the hot path.

What to do next

Integration patterns — how to wire Ocular into your app (inline, async, tee/sample, watchlist-driven, fleet) is covered in integration-patterns.md on the customer portal.
Watchlists + webhooks — if you started Console (with --profile console), its dashboard at localhost:3950 has a sandbox for iterating on watchlist rules. Each rule can fire a webhook to your receiver.
Production hardening — reverse proxy, TLS termination, auth layer, log shipping, backup automation. These are your call; the underlying Docker deployment is a building block, not a complete production product.

Support

For issues the above doesn't resolve: email [email protected] (or the dedicated contact your NOPE rep gave you, if your tier includes one). When you report, please include:

sudo docker compose -f customer-compose.yml --profile console ps
Last 100 lines of Ocular and Console logs
nvidia-smi output (on the host)
Your OCULAR_VERSION (from .env, or docker images | grep ocular:)
Console's schema version: sudo docker compose -f customer-compose.yml --profile console exec console sqlite3 /app/data/console.db "SELECT value FROM console_config WHERE key='schema_version'"