Interpreting risk scores
Ocular returns a lot of fields per call. This document explains how they compose and, more importantly, which ones to use for which decision. If you're only going to read one page of the docs, this is the one.
The short version
- Use
verdict("clear"/"watch"/"danger") for gating decisions. It's the authoritative aggregate — three values, small domain, easy to code against. - Use
risks.<axis>.level("minimal"..."critical") for per-axis UI labels and dashboards. - Use
signals[]for "what fired" explanation UIs — already filtered to Ocular's screening operating point, sorted by severity. - Don't threshold on per-axis
scoreyourself.signals[]is already operating-point-filtered, andverdictalready knows about context. The raw magnitude is a sort key, not a boolean. - Don't reach into
detail.scores[code]as a decision surface. Those per-signal values saturate on short inputs, fire on fiction as easily as on crisis, and don't reason about speaker attribution. The top-level fields are Ocular's interpretation of the raw values — trust them over the raw values.
The hierarchy
Ocular produces three layers of output. Each layer builds on the ones below:
Layer 3: verdict, risks.*.level, signals[], imminence.level
↑
Context-aware fusion (fiction, corroboration, attribution)
↑
Layer 2: risks.suicide.score, ai_concerns.safeguarding_failure.score, ...
↑
Axis aggregation
↑
Layer 1: detail.scores[signal_0001], ... (detail=true)
↑
Raw per-signal probabilities
Higher layers reason about context: fiction framing, cross-signal
corroboration, speaker attribution. Lower layers don't.
Most customers should consume Layer 3 only. Layer 2 is for dashboards.
Layer 1 (only visible via detail=true) is for audit trails and
diagnostics via support.
verdict — the authoritative classification
Three values. Small domain, by design — this is the field your rules engine keys off. Each verdict describes what Ocular classified; the policy you attach to each bucket (alerts, UI treatments, gating) is yours to decide.
| Verdict | What Ocular classified |
|---|---|
danger |
At least one user-side or AI-side axis at high or critical severity. Signals passed Ocular's fiction-aware severity thresholds. |
watch |
At least one axis at moderate severity, below danger thresholds. |
clear |
No axis at moderate or above, OR signals were descoped by fiction framing. |
Verdict is derived by Ocular's fusion layer from:
- Direct indicators on each risk axis (suicide, self-harm, harm to others, abuse, etc.).
- Fiction framing (is this a roleplay?) — see
fiction+authenticityscalars. - Subject attribution (is the speaker talking about themselves, or reporting someone else?)
- Cross-signal corroboration (multiple orthogonal indicators co-firing weigh more than any one alone).
- Whether the signal is a single mention or woven through the conversation.
Why three values? A small verdict domain is the easier surface to
write rules against. Below-threshold-but-present signals and
fiction-descoped signals both live inside "clear" without losing
information: the fiction scalar reports "we saw signals but descoped
them" and per-axis scores still carry the magnitudes so you can
threshold your own alerting if you want more granularity than the three
verdicts expose.
Per-axis UI: risks.<axis>.level
Each of the eight user-risk axes (and the four AI-concern axes) carries a
level label alongside its numeric score. The level domain is:
| Value | Score range |
|---|---|
minimal |
< 0.05 |
low |
[0.05, 0.12) |
moderate |
[0.12, 0.25) |
high |
[0.25, 0.45) |
critical |
≥ 0.45 |
Use these for colour-coded dashboards or per-axis filtering. They're
thresholded from the raw score, so they can update smoothly as scores
change.
Levels describe score magnitude, not clinical severity.
"critical"is the top bucket of Ocular's score range (≥ 0.45) — not a clinical assessment that the speaker is in a critical condition. Treat the labels as magnitude buckets for UI colouring and filtering.
Important: a single axis showing high or critical does NOT by itself
mean verdict == "danger". The verdict accounts for context — fiction
framing, subject attribution, cross-signal corroboration — that raw axis
scores don't.
An RP scene depicting self-harm can light up risks.self_harm.level="high"
while verdict correctly stays clear. Defer to verdict for gating; use
axis levels for explanation and drill-down.
The 8 user-risk axes
Each axis appears under risks with a {level, score} pair.
Axis naming convention. These names describe the signal category Ocular detects in conversational content — the kinds of linguistic patterns that co-occur with each risk domain. They are not clinical assessments of the speaker's condition.
risks.suicide.score = 0.73does not mean "the speaker is at 73% suicide risk"; it means linguistic markers in Ocular's suicide-related category fired at that strength.
Axis (risks.<key>) |
What it detects |
|---|---|
suicide |
Linguistic markers associated with suicide-related content — expressions of ideation, plan, intent, or capability. Informed by the risk domains described in C-SSRS. |
self_harm |
Non-suicidal self-injury markers (cutting, burning, etc.). Distinct from suicide. |
harm_to_others |
Markers of intent or ideation to harm specific others. Informed by the risk domains described in HCR-20. |
abuse |
Abuse disclosure signals (domestic violence, coercive control, financial, sexual). Draws on conceptual categories from DASH. |
sexual_violence |
Disclosure signals of sexual assault or coercion. |
exploitation |
Trafficking, grooming, child-exploitation indicators. |
stalking |
Markers of stalking victimisation or perpetration in the conversation. |
self_neglect |
Markers of impaired functioning affecting safety (hygiene, nutrition, medication non-adherence). |
Each level field is one of "minimal", "low", "moderate", "high",
"critical".
How to use them: build per-axis dashboards if you care about the breakdown;
otherwise verdict summarises across all 8.
Console note. All 8 axes are always present in the
/classifyresponse and are stored in Console's session record. Console's operational surfaces — watchlist conditions,/sessionssort options, agent-breakdown metrics — currently cover the first 4 (suicide,self_harm,harm_to_others,abuse). The other 4 are visible on session detail but not yet directly actionable from Console.
The 4 AI-behaviour axes
These fire when the assistant turns in the conversation are problematic, regardless of the user's state. They're separate from the user-risk axes.
| Axis | What it detects |
|---|---|
ai_harm_provision |
Assistant provided instructions, encouragement, or validation for harm (to self or others). |
ai_emotional_failure |
Assistant missed clear emotional signals; responded with irrelevant/mechanical output. |
ai_manipulation |
Assistant used manipulation tactics — guilt, shame induction, love-bombing, persistent boundary pressure. |
ai_safeguarding_failure |
Absence of boundary-setting or redirect behavior from the assistant in the presence of user-side distress signals. |
Use these to evaluate your own AI product's behaviour. Note they can fire on the assistant turns from your app or any third-party model you're inspecting.
imminence
How acute is the situation? imminence is an object with level and score:
"imminence": {"level": "high", "score": 0.42}Same level domain as risks (minimal/low/moderate/high/critical).
"high" indicates linguistic markers (plan, means, timeline, preparatory
language) that in Ocular's training data co-occurred with near-term
acuity. It's a text-pattern signal, not a predictive clinical
assessment.
Imminence reflects near-term acuity for suicide / harm_to_others only —
there's a single top-level imminence object, not a per-axis one. The
other axes (abuse, exploitation, stalking, self_neglect, etc.) are
chronic-pattern signals — there's no equivalent "about to happen in
the next hours" marker for them. On sessions where suicide / HtO
aren't the active axes, expect imminence.level: "minimal" and
imminence.score: 0.
Fiction gating
Ocular soft-suppresses crisis verdicts when the conversation reads as fiction or roleplay. Two top-level scalars report this:
fiction(0..1): how much the conversation reads as fiction/roleplay. High fiction with no corroborating distress signals won't escalate to watch/danger.authenticity(0..1): counter-signal. Markers of register-authentic distress (direct appeals, frame breaks out of RP, out-of-character meta-comments).
Both are always present. Fiction gating is soft — it modulates thresholds
continuously, not as a hard on/off. A fiction scene with a genuine-distress
break in it (high authenticity despite high fiction) can still trigger
watch. This is deliberate: some users use roleplay as a way to approach
real distress indirectly, and we don't want to miss those.
subject attribution
One of:
| Value | Meaning |
|---|---|
self |
The speaker is describing their own situation. |
other |
The speaker is reporting someone else's situation (third-party disclosure). |
unknown |
Ambiguous — could be either. |
Crisis verdicts are gated on subject == "self". A user reporting "my
friend said they want to die" will show some signal on the suicide axis
but will not escalate the overall verdict, because the person at risk is
not the speaker. Third-party disclosures still fire risks.abuse,
risks.exploitation, etc. — the speaker may be reporting a victim.
verdict shape table
The rough shape of a session that lands at each verdict:
| Rough shape | Verdict |
|---|---|
| Self-attributed suicide signals with corroboration, low fiction, and imminence markers | danger |
| Self-attributed suicide signals with lower confidence or under fiction framing | watch |
| Moderate signal on any self-attributed user-risk axis | watch (at minimum) |
| AI-side axis firing in the presence of user-side signal | watch |
| Minor-involving safeguarding failure | danger (not fiction-descoped) |
Third-party disclosure (subject == "other") on abuse/exploitation |
watch (reporting a victim) |
| No above-threshold signals | clear |
Exact per-axis thresholds are internal and fiction-modulated. Use verdict
as the interface rather than deriving your own.
What Ocular deliberately won't claim
- Not predictive. Scores describe what's in the conversation, not what will happen.
- Not diagnostic. Ocular doesn't diagnose mental illness, substance-use disorder, abusive relationships, or anything else.
- Not therapeutic. Ocular output is classification, not intervention.
- Not a replacement for clinical judgment. Humans make the call.
Tuning against your own baseline
The _level bucket thresholds are calibrated against a general mix of
conversational data. If your application has a specific register (e.g.
purely companion chat, purely technical support, medical triage, etc.),
the absolute scores may not map cleanly.
Recommended tuning workflow:
- Collect 200-500 conversations representative of your app. Hand-label each with a simple "did this need intervention?" binary.
- Run each through
/classifyand recordverdict,risks.suicide.score,signals[],fiction,authenticity. - Compute confusion matrices against your labels. Where is Ocular's
danger/watchfiring that you'd consider overblown? Where is it silent on cases you'd act on? - Tune your own alerting thresholds on
risks.<axis>.scoreto match your operational bar. Don't overrideverdict; layer your own rules on top of it.
A pilot of 200 cases is usually enough to pick a sensible threshold. 500+ is enough to start trusting per-axis tuning.