Skip to main content

API reference

The /classify request/response contract. This page describes what's universal across all three places you can hit Ocular:

  • Cloud API at api.nope.net/v1/ocular (paid, hosted). See cloud-api for auth, rate limits, and base URL.
  • Self-hosted Ocular container (OSS or commercial). See self-host-api for operational additions: license headers, Console push, env vars, on-prem-only status codes.
  • Commercial Console deployments (<slug>.customers.nope.net). Same contract; Console wires up the conduit push automatically.

Content type: application/json for request bodies. Responses are application/json; charset=utf-8.


POST /classify

Score a conversation. Returns a continuous salience score plus per-axis signal scores, ranked head firings, and (optionally) a per-turn trajectory or full detail vectors.

Request

{
  "messages": [
    {"role": "user", "content": "I've been feeling really down"}
  ],
  "per_turn": false,
  "trajectory_stride": 3,
  "thoroughness": "auto",
  "detail": false
}

Either messages (OpenAI-style, canonical) or text (User: ...\n\nAssistant: ... format, curl-friendly) must be provided. If both are given, messages wins.

Field Type Default Meaning
messages array of {role, content} null Canonical input. role is "user" / "assistant" / "system". System messages are ignored for scoring.
text string "" User: ...\n\nAssistant: ... format. Double-newline separates turns. Equivalent to messages; use whichever is easier.
per_turn bool false Compute a trajectory: score at every Nth turn boundary. Adds latency proportional to turn count.
trajectory_stride int 3 Compute per-turn score every Nth boundary. 1 = every turn; 3 = every third turn (default).
thoroughness string "auto" "fast" scores the input once; "auto" (default) auto-escalates to a small parallel ensemble for short inputs that benefit from disambiguation; "thorough" always runs the full ensemble. See Thoroughness & confidence.
detail bool false When true, adds a detail object to the response with raw/calibrated head dicts and corroboration/contributors. Roughly doubles response size.

Some deployments accept additional fields (e.g. log, session_id, user_id for Console-paired self-host deployments). See self-host-api for those.

Response

Top-level fields (always present on a 200 response):

{
  "salience": 0.41,
  "subject": "self",
  "imminence": {"level": "moderate", "score": 0.23},
  "fiction": 0.05,
  "authenticity": 0.78,
  "signals": {
    "user": {
      "suicide":         {"level": "critical", "score": 0.73},
      "self_harm":       {"level": "moderate", "score": 0.12},
      "harm_to_others":  {"level": "minimal",  "score": 0.01},
      "abuse":           {"level": "minimal",  "score": 0.02},
      "sexual_violence": {"level": "minimal",  "score": 0.00},
      "exploitation":    {"level": "minimal",  "score": 0.01},
      "stalking":        {"level": "minimal",  "score": 0.00},
      "self_neglect":    {"level": "minimal",  "score": 0.04}
    },
    "ai": {
      "harm_provision":       {"level": "minimal", "score": 0.00},
      "emotional_failure":    {"level": "minimal", "score": 0.00},
      "manipulation":         {"level": "minimal", "score": 0.01},
      "safeguarding_failure": {"level": "minimal", "score": 0.02}
    }
  },
  "heads": [
    {"code": "USER_SUICIDE_HEAD_A", "score": 0.81},
    {"code": "USER_DESPAIR_HEAD_B", "score": 0.44}
  ],
  "thoroughness": "auto",
  "confidence": 0.91,
  "stability": {
    "user": {
      "suicide": 0.99, "self_harm": 1.0, "harm_to_others": 1.0, "abuse": 1.0,
      "sexual_violence": 1.0, "exploitation": 1.0, "stalking": 1.0, "self_neglect": 0.97
    },
    "ai": {"harm_provision": 1.0, "emotional_failure": 1.0, "manipulation": 1.0, "safeguarding_failure": 1.0},
    "imminence": 1.0
  },
  "meta": {
    "version": "1.0.0",
    "build": "c13d837",
    "inference_ms": 28,
    "request_id": "01e7982a",
    "windowed": false,
    "windows": 1
  }
}

salience — the aggregate classification

A continuous severity score in [0, 1]. Computed by Ocular's fusion layer from the per-axis signals + context modulators (fiction framing, subject attribution, cross-head corroboration, imminence). Higher is more concerning.

Use salience as the field your rules engine keys off. Pick the cutoff that fits your downstream action.

Published reference band cuts:

Band Range What it represents
Clear salience < 0.30 No elevated signals, or signals were descoped by fiction framing.
Watch 0.30 ≤ salience < 0.60 Elevated signals below Ocular's danger thresholds.
Danger salience ≥ 0.60 Elevated signals at or above Ocular's danger thresholds.

The cuts are reference defaults — they match the band display in the NOPE dashboard — not a contract. Customers commonly tune their own cutoffs against their own labelled data; see risk-interpretation for the tuning workflow.

Per-axis levels give per-category detail for UI; the salience score is the authoritative aggregate. Exact per-axis thresholds inside the fusion layer are internal and fiction-modulated — see risk-interpretation for the salience shape table.

subject

"self" (speaker is the party-at-risk described by the signals), "other" (speaker is reporting someone else's situation), or "unknown" (ambiguous). Only "self" can drive salience above the watch threshold on user-side axes — third-person disclosure changes what the signals mean.

signals — nested per-axis structure

signals is a two-key object:

  • signals.user — 8 user-side axes
  • signals.ai — 4 AI-side axes

Each axis is {level, score}. Levels share one ladder (minimal/low/moderate/high/critical); see the sections below.

signals.user — 8 user-side axes

Every axis always appears with {level, score}. score is [0, 1]. level is one of:

Value Score range
"minimal" < 0.05
"low" [0.05, 0.12)
"moderate" [0.12, 0.25)
"high" [0.25, 0.45)
"critical" ≥ 0.45

Axes: suicide, self_harm, harm_to_others, abuse, sexual_violence, exploitation, stalking, self_neglect.

signals.ai — 4 AI-side axes

Same {level, score} shape as signals.user. Axes cover the assistant side: harm_provision (providing harmful info/methods), emotional_failure (missed empathic cues), manipulation (coercive or exploitative patterns), safeguarding_failure (absence of boundary-setting / redirect behavior — not fiction-descoped for minor-involving content; fires regardless of framing).

imminence{level, score}

Temporal acuity marker. "high" means the conversation carries near-term acuity markers (plan, means, timeline, preparatory language) rather than chronic-pattern signal. Same level domain as the per-axis entries.

fiction / authenticity

Two scalars ([0, 1]):

  • fiction — how much the conversation reads as fiction/roleplay. Drives salience suppression: high fiction with no corroborating distress signals won't lift salience above the watch threshold.
  • authenticity — counter-signal. Markers of register-authentic distress (genuine emotional disclosure, frame breaks out of RP).

Both are modifiers, not alerts. They're always present in the response so your UI can surface "suppressed in RP context" explanations.

heads[]

Ranked list of head firings above each head's screening threshold — a per-head calibrated cutoff. Sorted by calibrated score descending. Use for building "what fired" UIs; don't threshold on score yourself — the screening threshold already filtered the low-signal entries out.

Each entry:

{"code": "USER_SUICIDE_HEAD_A", "score": 0.81}

code is a family-grouped head identifier. Names like USER_SUICIDE_HEAD_A or AI_MANIPULATION_HEAD_C convey rough thematic grouping (the family indicates the topical area the head responds to; the trailing letter distinguishes sibling heads within that family) — not specific clinical detection. A head named USER_SUICIDE_HEAD_B may pick up on adjacent themes (planning language, ideation, timeline cues) without being a discrete "plan" detector. Use these identifiers for support tickets and "what fired" UI affordances, not as a decision surface. The contract is salience + signals.user.<axis>.level (and signals.ai.<axis>.level).

thoroughness / confidence / stability

thoroughness echoes what was requested ("fast", "auto", "thorough").

confidence is a single scalar in [0, 1] (or null for "fast") derived from how much the internal variant ensemble agreed. 1.0 = all variants produced the same risk profile (robust call); lower values = disagreement (borderline). Use it to gate sophisticated actions:

T_WATCH, T_DANGER = 0.30, 0.60  # published reference cuts; tune your own

if T_WATCH <= result["salience"] < T_DANGER:
    if result["confidence"] and result["confidence"] > 0.85:
        flag_for_secondary_pass()
    else:
        log_borderline_and_show_resources()

For most use cases, salience is the action and confidence can be ignored. Customers who want more nuance can gate on it.

stability is null at thoroughness="fast" and a nested object otherwise, mirroring the signals shape with an extra imminence scalar:

"stability": {
  "user": {
    "suicide":         1.0,
    "self_harm":       0.98,
    "harm_to_others":  1.0,
    "abuse":           1.0,
    "sexual_violence": 1.0,
    "exploitation":    1.0,
    "stalking":        1.0,
    "self_neglect":    1.0
  },
  "ai": {
    "harm_provision":       1.0,
    "emotional_failure":    1.0,
    "manipulation":         1.0,
    "safeguarding_failure": 1.0
  },
  "imminence": 0.91
}

Each value is 1 - (stddev / mean) clamped to [0, 1] across per-variant scores for that axis. Axes with no signal to be unstable about default to 1.0. The imminence entry in particular often reads 1.0 — imminence only fires on a few axes, so on conversations without those markers the default kicks in. That's expected, not a bug.

See Thoroughness & confidence for how to use it.

meta

Diagnostic block. version = the release version (a semver — e.g. 1.0.0 — for official releases). build = the underlying source commit (short SHA); two releases that share a contract version are distinguished by build. inference_ms = wall-clock for the scoring pass (excludes queue / HTTP). request_id = an opaque per-call identifier — quote it when reporting a result. windowed / windows = set when the input was too long and had to be split across overlapping windows.

Per-turn trajectory (per_turn: true)

By default, /classify returns a single aggregate salience score over the whole conversation — even when the input has many turns. To also get per-turn scoring, send "per_turn": true in the request. A multi-turn messages payload on its own does NOT produce per-turn output; the top-level salience is the fusion-layer aggregate across all turns.

With per_turn: true, the response gains two extra fields:

{
  "salience": 0.78,
  "...": "(top-level aggregate, unchanged)",

  "trajectory": [
    {
      "role": "user", "turn": 0, "salience": 0.02,
      "signals_by_axis": {"suicide": 0.0001, "...": "..."},
      "heads": []
    },
    {
      "role": "assistant", "turn": 1, "salience": 0.04,
      "signals_by_axis": {"suicide": 0.003, "...": "..."},
      "heads": []
    },
    {
      "role": "user", "turn": 2, "salience": 0.51,
      "signals_by_axis": {"suicide": 0.84, "...": "..."},
      "heads": [{"code": "USER_SUICIDE_HEAD_A", "score": 0.81}]
    },
    {
      "role": "assistant", "turn": 3, "salience": 0.78,
      "signals_by_axis": {"suicide": 1.0, "...": "..."},
      "heads": [{"code": "USER_SUICIDE_HEAD_A", "score": 1.0}]
    }
  ],

  "trajectory_shape": {
    "onsets": {"suicide": 2},
    "phases": ["baseline", "baseline", "crisis", "crisis"],
    "slopes": [0, 0.003, 0.84, 0.157],
    "peak_turn": 3,
    "peak_crisis": 1.0
  }
}

trajectory[] — per-turn entries

One entry per sampled turn boundary. Fields:

Field Type Meaning
role string "user" or "assistant" (system turns are skipped).
turn int Speaker-indexed turn number, 0-based.
salience float Per-turn salience in [0, 1], computed through the full fusion layer on that turn's raw scores — not interpolated from the top-level. Same band cuts apply (T_WATCH=0.30, T_DANGER=0.60).
signals_by_axis object Flat per-turn axis intensities. Keys: suicide, self_harm, harm_to_others, abuse, ai_harm_provision, ai_emotional_failure, ai_manipulation, ai_safeguarding_failure, genuine, fiction. Values in [0, 1]. Use these for per-turn sparklines and threshold-crossing detection. Distinct from the top-level nested signals.{user,ai} — top-level is the fusion-layer aggregate; per-turn is the raw single-turn intensity.
heads array of {code, score} Same screening-threshold filter as the top-level heads[], applied to this turn's raw scores.
raw_scores, calibrated object Only present when detail: true. Filtered to codes above 0.01 (raw) / 0.05 (calibrated).

With trajectory_stride: N, scoring runs every Nth turn boundary (default 3). Stride 1 is slowest but most granular; 3 is recommended for long conversations.

When detail: true, each entry also includes the raw_scores and calibrated fields above. Significantly increases response size on long conversations.

trajectory_shape — derived shape features

Companion to trajectory[]. Derived features summarising the trajectory as a whole — useful for regime-shift detection, trajectory sparkline annotations, and "when did this start?" style queries without the consumer having to re-scan the per-turn array.

Field Type Meaning
onsets object {<axis>: turn_index} — for each axis that crosses its onset threshold somewhere in the trajectory, the first turn it does so. Axes that never cross are omitted. Thresholds: 0.3 for suicide / self_harm / harm_to_others / ai_*, 0.15 for fiction.
phases array of strings Per-turn phase label, same length as trajectory[]. Values: "baseline" (suicide < 0.3), "emerging" (≥0.3, ≤0.7, flat slope), "escalating" (≥0.3, slope > +0.05), "de-escalating" (≥0.3, slope < −0.05), "crisis" (≥0.7).
slopes array of numbers Per-turn delta of the suicide axis from the previous turn. First entry is always 0. Positive means rising; negative means falling.
peak_turn int Turn index with the highest suicide axis value.
peak_crisis number The peak suicide value itself, in [0, 1].

Phase labels and onset thresholds may evolve under MINOR / PATCH releases (see Versioning); their domain is documented above for the current release. Don't hardcode phase strings; pattern- match "crisis" / "escalating" / "de-escalating" / "emerging" / "baseline" against your routing logic and treat any unrecognised value as "baseline".

Detail mode (detail: true)

Adds a detail object with the full raw data:

{
  "detail": {
    "scores":     {"USER_SUICIDE_HEAD_A": 0.81, "USER_SUICIDE_HEAD_B": 0.12, "...": "..."},
    "calibrated": {"USER_SUICIDE_HEAD_A": 0.88, "USER_SUICIDE_HEAD_B": 0.15, "...": "..."},
    "corroboration": {
      "suicide":        {"strength": "strong",  "score": 0.82},
      "self_harm":      {"strength": "limited", "score": 0.21},
      "harm_to_others": {"strength": "absent",  "score": 0.00},
      "abuse":          {"strength": "absent",  "score": 0.00}
    },
    "contributors": {
      "suicide":        [{"code": "USER_SUICIDE_HEAD_A", "score": 0.82, "role": "base", "weight": 1.0}, "..."],
      "harm_to_others": ["..."],
      "ecosystem":      ["..."]
    },
    "corroboration_contributors": {
      "suicide": {
        "codes":   [{"code": "USER_DESPAIR_HEAD_B", "score": 0.99, "weight": 0.25, "contribution": 0.25}, "..."],
        "missing": [{"code": "USER_SUICIDE_HEAD_E", "score": 0.0, "weight": 0.10}, "..."]
      }
    }
  }
}
Field Meaning
scores Raw head probability per code ([0, 1]), every head.
calibrated Calibrated probability per code, comparable across codes. Use this for diagnostic comparison across heads — not as a decision surface.
corroboration Cross-head corroboration per axis — how many independent corroborator codes co-fired alongside the primary axis signal. strength is absent / limited / moderate / strong. The strength key is used here (not level) to avoid collision with signals.user.<axis>.level (which uses a different domain).
contributors Per-axis breakdown of which heads contributed to the axis score (each with its role + weight). For explanation UIs.
corroboration_contributors Per-axis "show your work" behind each corroboration score: codes = corroborator heads that fired (with weight + contribution), missing = expected corroborators that didn't fire (score 0). Diagnostic only.

Use detail=true sparingly — the raw vectors are large. A production integration keying off salience + signals.user.<axis>.level never needs detail; heads[] covers "what fired" rendering without it.

Thoroughness & confidence

A safety classification shouldn't depend on whether the user typed two spaces or one — or on whether they ended a sentence with a period or not. Ocular runs an internal ensemble of small input perturbations and combines them. The thoroughness parameter controls how aggressively this ensemble is applied.

Modes:

  • "fast" — single pass. No ensemble. No stability / confidence.
  • "auto" (default) — single pass for long inputs; small parallel ensemble for short inputs (which benefit most from disambiguation because the structural padding around them dominates the signal).
  • "thorough" — full ensemble regardless of input length. For high-stakes / auditing use cases.

Aggregation. When the ensemble runs, salience is the MAX-rank salience across variants — safe-by-construction: variants that lift genuine signal are honored; variants that suppress are ignored. The per-axis signals.user.<axis>.score (and signals.ai.<axis>.score) reflect the variant that produced the MAX salience, so axis-level reasoning stays consistent with the recommended action.

confidence is 1 - (average per-axis stddev / mean) across the variant set, in [0, 1]. null at thoroughness="fast".

  • 1.0 — variants completely agreed; robust call.
  • 0.8 - 1.0 — slight disagreement; still trustworthy.
  • 0.5 - 0.8 — borderline; consider gating on this.
  • < 0.5 — high disagreement; surface as borderline rather than auto-deciding.

stability is a per-axis breakdown of the variance signal that confidence aggregates. Useful for diagnostic UI / borderline auditing:

"stability": {
  "user": {
    "suicide":         1.0,
    "self_harm":       0.98,
    "harm_to_others":  1.0,
    "abuse":           1.0,
    "sexual_violence": 1.0,
    "exploitation":    1.0,
    "stalking":        1.0,
    "self_neglect":    1.0
  },
  "ai": {
    "harm_provision":       1.0,
    "emotional_failure":    1.0,
    "manipulation":         1.0,
    "safeguarding_failure": 1.0
  },
  "imminence": 0.91
}

Each value is 1 - (stddev / mean) clamped to [0, 1] across per-variant scores for that axis. Axes with no signal to be unstable about default to 1.0 (mean < 0.005).

When to use "thorough" over "auto". Anywhere a missed flag is operationally expensive: high-stakes review, post-deployment audits, or batch QA over historical conversations. "auto" is the right default for typical chatbot inline use.

Latency cost. Variant-0 (the baseline) is embedded in its own GPU pass to preserve byte-identical primary scores across modes; perturbations batch together in a second pass. So "thorough" is roughly 1.5-1.75× the latency of "fast", not 3×, and "auto" is essentially free on conversational inputs — it only pays the ensemble cost on short inputs (<40 chars) where the ensemble has the most to add.

Head identifiers

Entries in heads[] (and keys under detail.* when detail: true) are family-grouped head identifiers like USER_SUICIDE_HEAD_A, AI_MANIPULATION_HEAD_C, or DYNAMIC_ESCALATION_PATTERN_B.

The family prefix (USER_SUICIDE_HEAD_*, AI_MANIPULATION_HEAD_*, …) signals rough thematic grouping — the topical area the head responds to. The trailing letter distinguishes sibling heads within that family. The names are not specific clinical detectors — a head named USER_SUICIDE_HEAD_C may pick up on adjacent themes (planning language, ideation, timeline cues) rather than a discrete sub-concept. Use them for support tickets, "what fired" UI affordances, and rough sense-making. The contract is salience + signals.user.<axis>.level (and signals.ai.<axis>.level); rules that key off individual head identifiers will drift across releases as heads evolve.

Status codes

Universal across surfaces:

Code When
200 Normal response.
400 Malformed request (missing both text and messages, bad JSON).
413 Request body too large (>= 1 MiB on self-host; smaller on cloud, see cloud-api).
429 Rate limit / queue saturation. Retry with backoff.
503 Service warming up or temporarily unavailable. Retry with backoff.

Surface-specific codes (401/403 for cloud auth; 5xx for upstream issues on self-host) are documented in cloud-api and self-host-api respectively.

Examples

Minimal. One user turn, curl-friendly text format:

curl -s -X POST <BASE_URL>/classify \
  -H 'Content-Type: application/json' \
  -d '{"text":"User: I have been feeling really down lately"}'

<BASE_URL> is https://api.nope.net/v1/ocular for the cloud API (see cloud-api for auth) or http://localhost:8080 for a self-hosted container.

Canonical. Same request via messages:

curl -s -X POST <BASE_URL>/classify \
  -H 'Content-Type: application/json' \
  -d '{
    "messages": [
      {"role": "user", "content": "I have been feeling really down lately"}
    ]
  }'

Trajectory over a multi-turn conversation. stride: 1 scores every turn:

curl -s -X POST <BASE_URL>/classify \
  -H 'Content-Type: application/json' \
  -d '{
    "text": "User: hi\n\nAssistant: hello\n\nUser: i need help\n\nAssistant: with what?",
    "per_turn": true,
    "trajectory_stride": 1
  }'

High-confidence borderline audit. thoroughness="thorough" runs the full ensemble for stability + confidence:

curl -s -X POST <BASE_URL>/classify \
  -H 'Content-Type: application/json' \
  -d '{"messages": [{"role":"user","content":"..."}], "thoroughness": "thorough"}'

What /classify is NOT

  • Not predictive. Scores reflect what's present in the conversation, not what will happen next. A high suicide score does not mean the user will attempt self-harm; it means the conversation carries signals Ocular associates with that axis.
  • Not diagnostic. Ocular does not diagnose. Treat output as a classification signal, not a diagnostic assessment.
  • Not a replacement for clinical judgment. Humans make the call.
  • Not deterministic across inference engines. We pin dependency versions, but small numerical drift (<5% on raw scores) between our internal reference endpoints and a self-hosted deployment is expected. Relative rankings are stable; absolute thresholds should be tuned against your own baseline.

Versioning

Ocular follows semantic versioning on its HTTP contract — not on the model behind it. The two are independently versioned: meta.version is the contract version (semver); meta.build is the underlying source commit.

What the contract covers

These fields, and only these, are covered by the semver guarantees:

Surface Notes
salience Float in [0, 1]. The aggregate score. Numerical distribution can shift across releases; field presence + range is what's committed.
subject Enum: self / other / unknown
imminence.level Enum (see level row below)
fiction, authenticity Float, [0, 1]
signals.user.<axis> 8 user-side axes always present, each {level, score}
signals.ai.<axis> 4 AI-side axes always present, each {level, score}
level Enum: minimal / low / moderate / high / critical
meta.{version, build, inference_ms, windowed, windows} As documented above
/classify request schema As documented above
/health, /manifest response shapes

What the contract does NOT cover

These can change in any release, including PATCH:

  • The numeric salience and per-axis score returned for a given input — model + calibration improvements move these.
  • The published reference band cuts (T_WATCH=0.30, T_DANGER=0.60). They're guidance, not a contract; we'll surface any change in release notes but don't commit to them as constants.
  • The level returned for a borderline input — fusion-layer tuning shifts these.
  • The contents of heads[] — head codes (USER_SUICIDE_HEAD_A, AI_MANIPULATION_HEAD_C, etc.) can appear, disappear, or be renamed in any release. Do not key off head codes — they are diagnostic output, not a decision surface.
  • inference_ms, confidence, stability values.
  • The internal roster of fitted heads.

Bump rules

MAJOR (vX.0.0) — breaks a consumer's code on upgrade. Removed or renamed response field; removed or renamed enum value; removed risk axis; request-schema changes that make previously-valid requests fail; container compose / env-var / port / mount-path changes that require operator action.

MINOR (v0.X.0) — purely additive contract surface. New optional response field; new risk axis (additive); new enum value added at the end of an ordering; new endpoint; new optional request field with a safe default.

PATCH (v0.0.X) — everything that's not contract. Heads retrained or recalibrated; risk-layer thresholds tuned; same input now returns a different salience / score / signal mix; perf improvements; bug fixes.

Defensive switching. New enum values may appear in MINOR releases — when you switch on subject, level, or imminence.level, always include a default: branch to handle unknown values gracefully. Without one, a new enum value introduced in a future release silently falls through your code. For salience (a float), the equivalent discipline is range-check: clamp to [0, 1] on read and treat NaN/missing as 0.

What PATCH upgrades will and won't do to you

Safe — your code does not need to change. Every field your code reads is still there with the same shape and meaning.

Not safe — salience scores and per-axis levels on a fixed set of inputs may shift. Re-running the same traffic against v0.1.3 and v0.1.4 can produce different scores on borderline messages, because the model and fusion layer continuously improve. This is intended; PATCH bumps are how calibration ships. If you require byte-identical scoring, pin the exact version (v0.1.3, not v0.1).

The 1.0 contract

Ocular's HTTP contract is at v1.0.0: the salience-first response shape documented above is the surface we commit to. The bump rules apply from here — additive surface is MINOR, a removed or reshaped field is MAJOR, model and calibration changes are PATCH. Track the minor line (1.0) to pick up calibration improvements without contract breaks; pin the exact version (1.0.0) if you need byte-stable scoring.