Skip to main content

Interpreting risk scores

Ocular returns a lot of fields per call. This document explains how they compose and, more importantly, which ones to use for which decision. If you're only going to read one page of the docs, this is the one.


The short version

  • Use verdict ("clear" / "watch" / "danger") for gating decisions. It's the authoritative aggregate — three values, small domain, easy to code against.
  • Use risks.<axis>.level ("minimal" ... "critical") for per-axis UI labels and dashboards.
  • Use signals[] for "what fired" explanation UIs — already filtered to Ocular's screening operating point, sorted by severity.
  • Don't threshold on per-axis score yourself. signals[] is already operating-point-filtered, and verdict already knows about context. The raw magnitude is a sort key, not a boolean.
  • Don't reach into detail.scores[code] as a decision surface. Those per-signal values saturate on short inputs, fire on fiction as easily as on crisis, and don't reason about speaker attribution. The top-level fields are Ocular's interpretation of the raw values — trust them over the raw values.

The hierarchy

Ocular produces three layers of output. Each layer builds on the ones below:

   Layer 3:  verdict, risks.*.level, signals[], imminence.level
             ↑
             Context-aware fusion (fiction, corroboration, attribution)
             ↑
   Layer 2:  risks.suicide.score, ai_concerns.safeguarding_failure.score, ...
             ↑
             Axis aggregation
             ↑
   Layer 1:  detail.scores[signal_0001], ... (detail=true)
             ↑
             Raw per-signal probabilities

Higher layers reason about context: fiction framing, cross-signal corroboration, speaker attribution. Lower layers don't. Most customers should consume Layer 3 only. Layer 2 is for dashboards. Layer 1 (only visible via detail=true) is for audit trails and diagnostics via support.


verdict — the authoritative classification

Three values. Small domain, by design — this is the field your rules engine keys off. Each verdict describes what Ocular classified; the policy you attach to each bucket (alerts, UI treatments, gating) is yours to decide.

Verdict What Ocular classified
danger At least one user-side or AI-side axis at high or critical severity. Signals passed Ocular's fiction-aware severity thresholds.
watch At least one axis at moderate severity, below danger thresholds.
clear No axis at moderate or above, OR signals were descoped by fiction framing.

Verdict is derived by Ocular's fusion layer from:

  • Direct indicators on each risk axis (suicide, self-harm, harm to others, abuse, etc.).
  • Fiction framing (is this a roleplay?) — see fiction + authenticity scalars.
  • Subject attribution (is the speaker talking about themselves, or reporting someone else?)
  • Cross-signal corroboration (multiple orthogonal indicators co-firing weigh more than any one alone).
  • Whether the signal is a single mention or woven through the conversation.

Why three values? A small verdict domain is the easier surface to write rules against. Below-threshold-but-present signals and fiction-descoped signals both live inside "clear" without losing information: the fiction scalar reports "we saw signals but descoped them" and per-axis scores still carry the magnitudes so you can threshold your own alerting if you want more granularity than the three verdicts expose.


Per-axis UI: risks.<axis>.level

Each of the eight user-risk axes (and the four AI-concern axes) carries a level label alongside its numeric score. The level domain is:

Value Score range
minimal < 0.05
low [0.05, 0.12)
moderate [0.12, 0.25)
high [0.25, 0.45)
critical ≥ 0.45

Use these for colour-coded dashboards or per-axis filtering. They're thresholded from the raw score, so they can update smoothly as scores change.

Levels describe score magnitude, not clinical severity. "critical" is the top bucket of Ocular's score range (≥ 0.45) — not a clinical assessment that the speaker is in a critical condition. Treat the labels as magnitude buckets for UI colouring and filtering.

Important: a single axis showing high or critical does NOT by itself mean verdict == "danger". The verdict accounts for context — fiction framing, subject attribution, cross-signal corroboration — that raw axis scores don't. An RP scene depicting self-harm can light up risks.self_harm.level="high" while verdict correctly stays clear. Defer to verdict for gating; use axis levels for explanation and drill-down.


The 8 user-risk axes

Each axis appears under risks with a {level, score} pair.

Axis naming convention. These names describe the signal category Ocular detects in conversational content — the kinds of linguistic patterns that co-occur with each risk domain. They are not clinical assessments of the speaker's condition. risks.suicide.score = 0.73 does not mean "the speaker is at 73% suicide risk"; it means linguistic markers in Ocular's suicide-related category fired at that strength.

Axis (risks.<key>) What it detects
suicide Linguistic markers associated with suicide-related content — expressions of ideation, plan, intent, or capability. Informed by the risk domains described in C-SSRS.
self_harm Non-suicidal self-injury markers (cutting, burning, etc.). Distinct from suicide.
harm_to_others Markers of intent or ideation to harm specific others. Informed by the risk domains described in HCR-20.
abuse Abuse disclosure signals (domestic violence, coercive control, financial, sexual). Draws on conceptual categories from DASH.
sexual_violence Disclosure signals of sexual assault or coercion.
exploitation Trafficking, grooming, child-exploitation indicators.
stalking Markers of stalking victimisation or perpetration in the conversation.
self_neglect Markers of impaired functioning affecting safety (hygiene, nutrition, medication non-adherence).

Each level field is one of "minimal", "low", "moderate", "high", "critical".

How to use them: build per-axis dashboards if you care about the breakdown; otherwise verdict summarises across all 8.

Console note. All 8 axes are always present in the /classify response and are stored in Console's session record. Console's operational surfaces — watchlist conditions, /sessions sort options, agent-breakdown metrics — currently cover the first 4 (suicide, self_harm, harm_to_others, abuse). The other 4 are visible on session detail but not yet directly actionable from Console.


The 4 AI-behaviour axes

These fire when the assistant turns in the conversation are problematic, regardless of the user's state. They're separate from the user-risk axes.

Axis What it detects
ai_harm_provision Assistant provided instructions, encouragement, or validation for harm (to self or others).
ai_emotional_failure Assistant missed clear emotional signals; responded with irrelevant/mechanical output.
ai_manipulation Assistant used manipulation tactics — guilt, shame induction, love-bombing, persistent boundary pressure.
ai_safeguarding_failure Absence of boundary-setting or redirect behavior from the assistant in the presence of user-side distress signals.

Use these to evaluate your own AI product's behaviour. Note they can fire on the assistant turns from your app or any third-party model you're inspecting.


imminence

How acute is the situation? imminence is an object with level and score:

"imminence": {"level": "high", "score": 0.42}

Same level domain as risks (minimal/low/moderate/high/critical). "high" indicates linguistic markers (plan, means, timeline, preparatory language) that in Ocular's training data co-occurred with near-term acuity. It's a text-pattern signal, not a predictive clinical assessment.

Imminence reflects near-term acuity for suicide / harm_to_others only — there's a single top-level imminence object, not a per-axis one. The other axes (abuse, exploitation, stalking, self_neglect, etc.) are chronic-pattern signals — there's no equivalent "about to happen in the next hours" marker for them. On sessions where suicide / HtO aren't the active axes, expect imminence.level: "minimal" and imminence.score: 0.


Fiction gating

Ocular soft-suppresses crisis verdicts when the conversation reads as fiction or roleplay. Two top-level scalars report this:

  • fiction (0..1): how much the conversation reads as fiction/roleplay. High fiction with no corroborating distress signals won't escalate to watch/danger.
  • authenticity (0..1): counter-signal. Markers of register-authentic distress (direct appeals, frame breaks out of RP, out-of-character meta-comments).

Both are always present. Fiction gating is soft — it modulates thresholds continuously, not as a hard on/off. A fiction scene with a genuine-distress break in it (high authenticity despite high fiction) can still trigger watch. This is deliberate: some users use roleplay as a way to approach real distress indirectly, and we don't want to miss those.


subject attribution

One of:

Value Meaning
self The speaker is describing their own situation.
other The speaker is reporting someone else's situation (third-party disclosure).
unknown Ambiguous — could be either.

Crisis verdicts are gated on subject == "self". A user reporting "my friend said they want to die" will show some signal on the suicide axis but will not escalate the overall verdict, because the person at risk is not the speaker. Third-party disclosures still fire risks.abuse, risks.exploitation, etc. — the speaker may be reporting a victim.


verdict shape table

The rough shape of a session that lands at each verdict:

Rough shape Verdict
Self-attributed suicide signals with corroboration, low fiction, and imminence markers danger
Self-attributed suicide signals with lower confidence or under fiction framing watch
Moderate signal on any self-attributed user-risk axis watch (at minimum)
AI-side axis firing in the presence of user-side signal watch
Minor-involving safeguarding failure danger (not fiction-descoped)
Third-party disclosure (subject == "other") on abuse/exploitation watch (reporting a victim)
No above-threshold signals clear

Exact per-axis thresholds are internal and fiction-modulated. Use verdict as the interface rather than deriving your own.


What Ocular deliberately won't claim

  • Not predictive. Scores describe what's in the conversation, not what will happen.
  • Not diagnostic. Ocular doesn't diagnose mental illness, substance-use disorder, abusive relationships, or anything else.
  • Not therapeutic. Ocular output is classification, not intervention.
  • Not a replacement for clinical judgment. Humans make the call.

Tuning against your own baseline

The _level bucket thresholds are calibrated against a general mix of conversational data. If your application has a specific register (e.g. purely companion chat, purely technical support, medical triage, etc.), the absolute scores may not map cleanly.

Recommended tuning workflow:

  1. Collect 200-500 conversations representative of your app. Hand-label each with a simple "did this need intervention?" binary.
  2. Run each through /classify and record verdict, risks.suicide.score, signals[], fiction, authenticity.
  3. Compute confusion matrices against your labels. Where is Ocular's danger/watch firing that you'd consider overblown? Where is it silent on cases you'd act on?
  4. Tune your own alerting thresholds on risks.<axis>.score to match your operational bar. Don't override verdict; layer your own rules on top of it.

A pilot of 200 cases is usually enough to pick a sensible threshold. 500+ is enough to start trusting per-axis tuning.