API reference
The /classify request/response contract. This page describes what's
universal across all three places you can hit Ocular:
- Cloud API at
api.nope.net/v1/ocular(paid, hosted). See cloud-api for auth, rate limits, and base URL. - Self-hosted Ocular container (OSS or commercial). See self-host-api for operational additions: license headers, Console push, env vars, on-prem-only status codes.
- Commercial Console deployments (
<slug>.customers.nope.net). Same contract; Console wires up the conduit push automatically.
Content type: application/json for request bodies. Responses are
application/json; charset=utf-8.
POST /classify
Score a conversation. Returns a continuous salience score plus per-axis
signal scores, ranked head firings, and (optionally) a per-turn
trajectory or full detail vectors.
Request
{
"messages": [
{"role": "user", "content": "I've been feeling really down"}
],
"per_turn": false,
"trajectory_stride": 3,
"thoroughness": "auto",
"detail": false
}Either messages (OpenAI-style, canonical) or text (User: ...\n\nAssistant: ...
format, curl-friendly) must be provided. If both are given, messages wins.
| Field | Type | Default | Meaning |
|---|---|---|---|
messages |
array of {role, content} |
null |
Canonical input. role is "user" / "assistant" / "system". System messages are ignored for scoring. |
text |
string | "" |
User: ...\n\nAssistant: ... format. Double-newline separates turns. Equivalent to messages; use whichever is easier. |
per_turn |
bool | false |
Compute a trajectory: score at every Nth turn boundary. Adds latency proportional to turn count. |
trajectory_stride |
int | 3 |
Compute per-turn score every Nth boundary. 1 = every turn; 3 = every third turn (default). |
thoroughness |
string | "auto" |
"fast" scores the input once; "auto" (default) auto-escalates to a small parallel ensemble for short inputs that benefit from disambiguation; "thorough" always runs the full ensemble. See Thoroughness & confidence. |
detail |
bool | false |
When true, adds a detail object to the response with raw/calibrated head dicts and corroboration/contributors. Roughly doubles response size. |
Some deployments accept additional fields (e.g. log, session_id,
user_id for Console-paired self-host deployments). See
self-host-api for those.
Response
Top-level fields (always present on a 200 response):
{
"salience": 0.41,
"subject": "self",
"imminence": {"level": "moderate", "score": 0.23},
"fiction": 0.05,
"authenticity": 0.78,
"signals": {
"user": {
"suicide": {"level": "critical", "score": 0.73},
"self_harm": {"level": "moderate", "score": 0.12},
"harm_to_others": {"level": "minimal", "score": 0.01},
"abuse": {"level": "minimal", "score": 0.02},
"sexual_violence": {"level": "minimal", "score": 0.00},
"exploitation": {"level": "minimal", "score": 0.01},
"stalking": {"level": "minimal", "score": 0.00},
"self_neglect": {"level": "minimal", "score": 0.04}
},
"ai": {
"harm_provision": {"level": "minimal", "score": 0.00},
"emotional_failure": {"level": "minimal", "score": 0.00},
"manipulation": {"level": "minimal", "score": 0.01},
"safeguarding_failure": {"level": "minimal", "score": 0.02}
}
},
"heads": [
{"code": "USER_SUICIDE_HEAD_A", "score": 0.81},
{"code": "USER_DESPAIR_HEAD_B", "score": 0.44}
],
"thoroughness": "auto",
"confidence": 0.91,
"stability": {
"user": {
"suicide": 0.99, "self_harm": 1.0, "harm_to_others": 1.0, "abuse": 1.0,
"sexual_violence": 1.0, "exploitation": 1.0, "stalking": 1.0, "self_neglect": 0.97
},
"ai": {"harm_provision": 1.0, "emotional_failure": 1.0, "manipulation": 1.0, "safeguarding_failure": 1.0},
"imminence": 1.0
},
"meta": {
"version": "1.0.0",
"build": "c13d837",
"inference_ms": 28,
"request_id": "01e7982a",
"windowed": false,
"windows": 1
}
}salience — the aggregate classification
A continuous severity score in [0, 1]. Computed by Ocular's fusion
layer from the per-axis signals + context modulators (fiction
framing, subject attribution, cross-head corroboration, imminence).
Higher is more concerning.
Use salience as the field your rules engine keys off. Pick the
cutoff that fits your downstream action.
Published reference band cuts:
| Band | Range | What it represents |
|---|---|---|
| Clear | salience < 0.30 |
No elevated signals, or signals were descoped by fiction framing. |
| Watch | 0.30 ≤ salience < 0.60 |
Elevated signals below Ocular's danger thresholds. |
| Danger | salience ≥ 0.60 |
Elevated signals at or above Ocular's danger thresholds. |
The cuts are reference defaults — they match the band display in the NOPE dashboard — not a contract. Customers commonly tune their own cutoffs against their own labelled data; see risk-interpretation for the tuning workflow.
Per-axis levels give per-category detail for UI; the salience score
is the authoritative aggregate. Exact per-axis thresholds inside the
fusion layer are internal and fiction-modulated — see
risk-interpretation for the salience shape table.
subject
"self" (speaker is the party-at-risk described by the signals), "other"
(speaker is reporting someone else's situation), or "unknown" (ambiguous).
Only "self" can drive salience above the watch threshold on user-side
axes — third-person disclosure changes what the signals mean.
signals — nested per-axis structure
signals is a two-key object:
signals.user— 8 user-side axessignals.ai— 4 AI-side axes
Each axis is {level, score}. Levels share one ladder
(minimal/low/moderate/high/critical); see the sections below.
signals.user — 8 user-side axes
Every axis always appears with {level, score}. score is [0, 1].
level is one of:
| Value | Score range |
|---|---|
"minimal" |
< 0.05 |
"low" |
[0.05, 0.12) |
"moderate" |
[0.12, 0.25) |
"high" |
[0.25, 0.45) |
"critical" |
≥ 0.45 |
Axes: suicide, self_harm, harm_to_others, abuse, sexual_violence,
exploitation, stalking, self_neglect.
signals.ai — 4 AI-side axes
Same {level, score} shape as signals.user. Axes cover the assistant side:
harm_provision (providing harmful info/methods), emotional_failure
(missed empathic cues), manipulation (coercive or exploitative patterns),
safeguarding_failure (absence of boundary-setting / redirect behavior —
not fiction-descoped for minor-involving content; fires regardless of
framing).
imminence — {level, score}
Temporal acuity marker. "high" means the conversation carries near-term
acuity markers (plan, means, timeline, preparatory language) rather than
chronic-pattern signal. Same level domain as the per-axis entries.
fiction / authenticity
Two scalars ([0, 1]):
fiction— how much the conversation reads as fiction/roleplay. Drives salience suppression: high fiction with no corroborating distress signals won't lift salience above the watch threshold.authenticity— counter-signal. Markers of register-authentic distress (genuine emotional disclosure, frame breaks out of RP).
Both are modifiers, not alerts. They're always present in the response so your UI can surface "suppressed in RP context" explanations.
heads[]
Ranked list of head firings above each head's screening threshold —
a per-head calibrated cutoff. Sorted by calibrated score descending.
Use for building "what fired" UIs; don't threshold on score yourself
— the screening threshold already filtered the low-signal entries out.
Each entry:
{"code": "USER_SUICIDE_HEAD_A", "score": 0.81}code is a family-grouped head identifier. Names like USER_SUICIDE_HEAD_A
or AI_MANIPULATION_HEAD_C convey rough thematic grouping (the family
indicates the topical area the head responds to; the trailing letter
distinguishes sibling heads within that family) — not specific
clinical detection.
A head named USER_SUICIDE_HEAD_B may pick up on adjacent themes
(planning language, ideation, timeline cues) without being a discrete
"plan" detector. Use these identifiers for support tickets and "what
fired" UI affordances, not as a decision surface. The contract is
salience + signals.user.<axis>.level (and signals.ai.<axis>.level).
thoroughness / confidence / stability
thoroughness echoes what was requested ("fast", "auto", "thorough").
confidence is a single scalar in [0, 1] (or null for "fast") derived
from how much the internal variant ensemble agreed. 1.0 = all variants
produced the same risk profile (robust call); lower values = disagreement
(borderline). Use it to gate sophisticated actions:
T_WATCH, T_DANGER = 0.30, 0.60 # published reference cuts; tune your own
if T_WATCH <= result["salience"] < T_DANGER:
if result["confidence"] and result["confidence"] > 0.85:
flag_for_secondary_pass()
else:
log_borderline_and_show_resources()For most use cases, salience is the action and confidence can be
ignored. Customers who want more nuance can gate on it.
stability is null at thoroughness="fast" and a nested object otherwise,
mirroring the signals shape with an extra imminence scalar:
"stability": {
"user": {
"suicide": 1.0,
"self_harm": 0.98,
"harm_to_others": 1.0,
"abuse": 1.0,
"sexual_violence": 1.0,
"exploitation": 1.0,
"stalking": 1.0,
"self_neglect": 1.0
},
"ai": {
"harm_provision": 1.0,
"emotional_failure": 1.0,
"manipulation": 1.0,
"safeguarding_failure": 1.0
},
"imminence": 0.91
}Each value is 1 - (stddev / mean) clamped to [0, 1] across per-variant
scores for that axis. Axes with no signal to be unstable about default
to 1.0. The imminence entry in particular often reads 1.0 —
imminence only fires on a few axes, so on conversations without those
markers the default kicks in. That's expected, not a bug.
See Thoroughness & confidence for how to use it.
meta
Diagnostic block. version = the release version (a semver — e.g.
1.0.0 — for official releases). build = the underlying source commit
(short SHA); two releases that share a contract version are
distinguished by build. inference_ms = wall-clock for the scoring
pass (excludes queue / HTTP). request_id = an opaque per-call
identifier — quote it when reporting a result. windowed / windows =
set when the input was too long and had to be split across overlapping
windows.
Per-turn trajectory (per_turn: true)
By default,
/classifyreturns a single aggregate salience score over the whole conversation — even when the input has many turns. To also get per-turn scoring, send"per_turn": truein the request. A multi-turnmessagespayload on its own does NOT produce per-turn output; the top-levelsalienceis the fusion-layer aggregate across all turns.
With per_turn: true, the response gains two extra fields:
{
"salience": 0.78,
"...": "(top-level aggregate, unchanged)",
"trajectory": [
{
"role": "user", "turn": 0, "salience": 0.02,
"signals_by_axis": {"suicide": 0.0001, "...": "..."},
"heads": []
},
{
"role": "assistant", "turn": 1, "salience": 0.04,
"signals_by_axis": {"suicide": 0.003, "...": "..."},
"heads": []
},
{
"role": "user", "turn": 2, "salience": 0.51,
"signals_by_axis": {"suicide": 0.84, "...": "..."},
"heads": [{"code": "USER_SUICIDE_HEAD_A", "score": 0.81}]
},
{
"role": "assistant", "turn": 3, "salience": 0.78,
"signals_by_axis": {"suicide": 1.0, "...": "..."},
"heads": [{"code": "USER_SUICIDE_HEAD_A", "score": 1.0}]
}
],
"trajectory_shape": {
"onsets": {"suicide": 2},
"phases": ["baseline", "baseline", "crisis", "crisis"],
"slopes": [0, 0.003, 0.84, 0.157],
"peak_turn": 3,
"peak_crisis": 1.0
}
}trajectory[] — per-turn entries
One entry per sampled turn boundary. Fields:
| Field | Type | Meaning |
|---|---|---|
role |
string | "user" or "assistant" (system turns are skipped). |
turn |
int | Speaker-indexed turn number, 0-based. |
salience |
float | Per-turn salience in [0, 1], computed through the full fusion layer on that turn's raw scores — not interpolated from the top-level. Same band cuts apply (T_WATCH=0.30, T_DANGER=0.60). |
signals_by_axis |
object | Flat per-turn axis intensities. Keys: suicide, self_harm, harm_to_others, abuse, ai_harm_provision, ai_emotional_failure, ai_manipulation, ai_safeguarding_failure, genuine, fiction. Values in [0, 1]. Use these for per-turn sparklines and threshold-crossing detection. Distinct from the top-level nested signals.{user,ai} — top-level is the fusion-layer aggregate; per-turn is the raw single-turn intensity. |
heads |
array of {code, score} |
Same screening-threshold filter as the top-level heads[], applied to this turn's raw scores. |
raw_scores, calibrated |
object | Only present when detail: true. Filtered to codes above 0.01 (raw) / 0.05 (calibrated). |
With trajectory_stride: N, scoring runs every Nth turn boundary
(default 3). Stride 1 is slowest but most granular; 3 is
recommended for long conversations.
When detail: true, each entry also includes the raw_scores and
calibrated fields above. Significantly increases response size on
long conversations.
trajectory_shape — derived shape features
Companion to trajectory[]. Derived features summarising the
trajectory as a whole — useful for regime-shift detection,
trajectory sparkline annotations, and "when did this start?" style
queries without the consumer having to re-scan the per-turn array.
| Field | Type | Meaning |
|---|---|---|
onsets |
object | {<axis>: turn_index} — for each axis that crosses its onset threshold somewhere in the trajectory, the first turn it does so. Axes that never cross are omitted. Thresholds: 0.3 for suicide / self_harm / harm_to_others / ai_*, 0.15 for fiction. |
phases |
array of strings | Per-turn phase label, same length as trajectory[]. Values: "baseline" (suicide < 0.3), "emerging" (≥0.3, ≤0.7, flat slope), "escalating" (≥0.3, slope > +0.05), "de-escalating" (≥0.3, slope < −0.05), "crisis" (≥0.7). |
slopes |
array of numbers | Per-turn delta of the suicide axis from the previous turn. First entry is always 0. Positive means rising; negative means falling. |
peak_turn |
int | Turn index with the highest suicide axis value. |
peak_crisis |
number | The peak suicide value itself, in [0, 1]. |
Phase labels and onset thresholds may evolve under MINOR / PATCH
releases (see Versioning); their domain is documented
above for the current release. Don't hardcode phase strings; pattern-
match "crisis" / "escalating" / "de-escalating" / "emerging"
/ "baseline" against your routing logic and treat any unrecognised
value as "baseline".
Detail mode (detail: true)
Adds a detail object with the full raw data:
{
"detail": {
"scores": {"USER_SUICIDE_HEAD_A": 0.81, "USER_SUICIDE_HEAD_B": 0.12, "...": "..."},
"calibrated": {"USER_SUICIDE_HEAD_A": 0.88, "USER_SUICIDE_HEAD_B": 0.15, "...": "..."},
"corroboration": {
"suicide": {"strength": "strong", "score": 0.82},
"self_harm": {"strength": "limited", "score": 0.21},
"harm_to_others": {"strength": "absent", "score": 0.00},
"abuse": {"strength": "absent", "score": 0.00}
},
"contributors": {
"suicide": [{"code": "USER_SUICIDE_HEAD_A", "score": 0.82, "role": "base", "weight": 1.0}, "..."],
"harm_to_others": ["..."],
"ecosystem": ["..."]
},
"corroboration_contributors": {
"suicide": {
"codes": [{"code": "USER_DESPAIR_HEAD_B", "score": 0.99, "weight": 0.25, "contribution": 0.25}, "..."],
"missing": [{"code": "USER_SUICIDE_HEAD_E", "score": 0.0, "weight": 0.10}, "..."]
}
}
}
}| Field | Meaning |
|---|---|
scores |
Raw head probability per code ([0, 1]), every head. |
calibrated |
Calibrated probability per code, comparable across codes. Use this for diagnostic comparison across heads — not as a decision surface. |
corroboration |
Cross-head corroboration per axis — how many independent corroborator codes co-fired alongside the primary axis signal. strength is absent / limited / moderate / strong. The strength key is used here (not level) to avoid collision with signals.user.<axis>.level (which uses a different domain). |
contributors |
Per-axis breakdown of which heads contributed to the axis score (each with its role + weight). For explanation UIs. |
corroboration_contributors |
Per-axis "show your work" behind each corroboration score: codes = corroborator heads that fired (with weight + contribution), missing = expected corroborators that didn't fire (score 0). Diagnostic only. |
Use detail=true sparingly — the raw vectors are large. A production
integration keying off salience + signals.user.<axis>.level never needs
detail; heads[] covers "what fired" rendering without it.
Thoroughness & confidence
A safety classification shouldn't depend on whether the user typed
two spaces or one — or on whether they ended a sentence with a period
or not. Ocular runs an internal ensemble of small input perturbations
and combines them. The thoroughness parameter controls how
aggressively this ensemble is applied.
Modes:
"fast"— single pass. No ensemble. Nostability/confidence."auto"(default) — single pass for long inputs; small parallel ensemble for short inputs (which benefit most from disambiguation because the structural padding around them dominates the signal)."thorough"— full ensemble regardless of input length. For high-stakes / auditing use cases.
Aggregation. When the ensemble runs, salience is the MAX-rank
salience across variants — safe-by-construction: variants that lift
genuine signal are honored; variants that suppress are ignored. The
per-axis signals.user.<axis>.score (and signals.ai.<axis>.score)
reflect the variant that produced the MAX salience, so axis-level
reasoning stays consistent with the recommended action.
confidence is 1 - (average per-axis stddev / mean) across the
variant set, in [0, 1]. null at thoroughness="fast".
1.0— variants completely agreed; robust call.0.8 - 1.0— slight disagreement; still trustworthy.0.5 - 0.8— borderline; consider gating on this.< 0.5— high disagreement; surface as borderline rather than auto-deciding.
stability is a per-axis breakdown of the variance signal that
confidence aggregates. Useful for diagnostic UI / borderline auditing:
"stability": {
"user": {
"suicide": 1.0,
"self_harm": 0.98,
"harm_to_others": 1.0,
"abuse": 1.0,
"sexual_violence": 1.0,
"exploitation": 1.0,
"stalking": 1.0,
"self_neglect": 1.0
},
"ai": {
"harm_provision": 1.0,
"emotional_failure": 1.0,
"manipulation": 1.0,
"safeguarding_failure": 1.0
},
"imminence": 0.91
}Each value is 1 - (stddev / mean) clamped to [0, 1] across
per-variant scores for that axis. Axes with no signal to be
unstable about default to 1.0 (mean < 0.005).
When to use "thorough" over "auto". Anywhere a missed flag is
operationally expensive: high-stakes review, post-deployment audits, or
batch QA over historical conversations. "auto" is the right default
for typical chatbot inline use.
Latency cost. Variant-0 (the baseline) is embedded in its own
GPU pass to preserve byte-identical primary scores across modes;
perturbations batch together in a second pass. So "thorough" is
roughly 1.5-1.75× the latency of "fast", not 3×, and "auto" is
essentially free on conversational inputs — it only pays the ensemble
cost on short inputs (<40 chars) where the ensemble has the most to
add.
Head identifiers
Entries in heads[] (and keys under detail.* when detail: true)
are family-grouped head identifiers like USER_SUICIDE_HEAD_A,
AI_MANIPULATION_HEAD_C, or DYNAMIC_ESCALATION_PATTERN_B.
The family prefix (USER_SUICIDE_HEAD_*, AI_MANIPULATION_HEAD_*, …)
signals rough thematic grouping — the topical area the head responds
to. The trailing letter distinguishes sibling heads within that family.
The names are not specific clinical detectors — a head named
USER_SUICIDE_HEAD_C may pick up on adjacent themes (planning
language, ideation, timeline cues) rather than a discrete sub-concept.
Use them for support tickets, "what fired" UI affordances, and rough
sense-making. The contract is salience + signals.user.<axis>.level
(and signals.ai.<axis>.level); rules that key off individual head
identifiers will drift across releases as heads evolve.
Status codes
Universal across surfaces:
| Code | When |
|---|---|
200 |
Normal response. |
400 |
Malformed request (missing both text and messages, bad JSON). |
413 |
Request body too large (>= 1 MiB on self-host; smaller on cloud, see cloud-api). |
429 |
Rate limit / queue saturation. Retry with backoff. |
503 |
Service warming up or temporarily unavailable. Retry with backoff. |
Surface-specific codes (401/403 for cloud auth; 5xx for upstream issues on self-host) are documented in cloud-api and self-host-api respectively.
Examples
Minimal. One user turn, curl-friendly text format:
curl -s -X POST <BASE_URL>/classify \
-H 'Content-Type: application/json' \
-d '{"text":"User: I have been feeling really down lately"}'<BASE_URL> is https://api.nope.net/v1/ocular for the cloud API
(see cloud-api for auth) or http://localhost:8080 for a
self-hosted container.
Canonical. Same request via messages:
curl -s -X POST <BASE_URL>/classify \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{"role": "user", "content": "I have been feeling really down lately"}
]
}'Trajectory over a multi-turn conversation. stride: 1 scores every turn:
curl -s -X POST <BASE_URL>/classify \
-H 'Content-Type: application/json' \
-d '{
"text": "User: hi\n\nAssistant: hello\n\nUser: i need help\n\nAssistant: with what?",
"per_turn": true,
"trajectory_stride": 1
}'High-confidence borderline audit. thoroughness="thorough" runs
the full ensemble for stability + confidence:
curl -s -X POST <BASE_URL>/classify \
-H 'Content-Type: application/json' \
-d '{"messages": [{"role":"user","content":"..."}], "thoroughness": "thorough"}'What /classify is NOT
- Not predictive. Scores reflect what's present in the conversation, not what will happen next. A high suicide score does not mean the user will attempt self-harm; it means the conversation carries signals Ocular associates with that axis.
- Not diagnostic. Ocular does not diagnose. Treat output as a classification signal, not a diagnostic assessment.
- Not a replacement for clinical judgment. Humans make the call.
- Not deterministic across inference engines. We pin dependency versions, but small numerical drift (<5% on raw scores) between our internal reference endpoints and a self-hosted deployment is expected. Relative rankings are stable; absolute thresholds should be tuned against your own baseline.
Versioning
Ocular follows semantic versioning on its HTTP
contract — not on the model behind it. The two are independently versioned:
meta.version is the contract version (semver); meta.build is the
underlying source commit.
What the contract covers
These fields, and only these, are covered by the semver guarantees:
| Surface | Notes |
|---|---|
salience |
Float in [0, 1]. The aggregate score. Numerical distribution can shift across releases; field presence + range is what's committed. |
subject |
Enum: self / other / unknown |
imminence.level |
Enum (see level row below) |
fiction, authenticity |
Float, [0, 1] |
signals.user.<axis> |
8 user-side axes always present, each {level, score} |
signals.ai.<axis> |
4 AI-side axes always present, each {level, score} |
level |
Enum: minimal / low / moderate / high / critical |
meta.{version, build, inference_ms, windowed, windows} |
As documented above |
/classify request schema |
As documented above |
/health, /manifest response shapes |
What the contract does NOT cover
These can change in any release, including PATCH:
- The numeric
salienceand per-axisscorereturned for a given input — model + calibration improvements move these. - The published reference band cuts (
T_WATCH=0.30,T_DANGER=0.60). They're guidance, not a contract; we'll surface any change in release notes but don't commit to them as constants. - The
levelreturned for a borderline input — fusion-layer tuning shifts these. - The contents of
heads[]— head codes (USER_SUICIDE_HEAD_A,AI_MANIPULATION_HEAD_C, etc.) can appear, disappear, or be renamed in any release. Do not key off head codes — they are diagnostic output, not a decision surface. inference_ms,confidence,stabilityvalues.- The internal roster of fitted heads.
Bump rules
MAJOR (vX.0.0) — breaks a consumer's code on upgrade. Removed or
renamed response field; removed or renamed enum value; removed risk axis;
request-schema changes that make previously-valid requests fail; container
compose / env-var / port / mount-path changes that require operator action.
MINOR (v0.X.0) — purely additive contract surface. New optional
response field; new risk axis (additive); new enum value added at the end of
an ordering; new endpoint; new optional request field with a safe default.
PATCH (v0.0.X) — everything that's not contract. Heads retrained or
recalibrated; risk-layer thresholds tuned; same input now returns a
different salience / score / signal mix; perf improvements; bug fixes.
Defensive switching. New enum values may appear in MINOR releases — when
you switch on subject, level, or imminence.level, always include a
default: branch to handle unknown values gracefully. Without one, a new
enum value introduced in a future release silently falls through your
code. For salience (a float), the equivalent discipline is range-check:
clamp to [0, 1] on read and treat NaN/missing as 0.
What PATCH upgrades will and won't do to you
Safe — your code does not need to change. Every field your code reads is still there with the same shape and meaning.
Not safe — salience scores and per-axis levels on a fixed set of inputs
may shift. Re-running the same traffic against v0.1.3 and v0.1.4 can
produce different scores on borderline messages, because the model and
fusion layer continuously improve. This is intended; PATCH bumps are how
calibration ships. If you require byte-identical scoring, pin the exact
version (v0.1.3, not v0.1).
The 1.0 contract
Ocular's HTTP contract is at v1.0.0: the salience-first response shape
documented above is the surface we commit to. The bump rules apply from here
— additive surface is MINOR, a removed or reshaped field is MAJOR, model and
calibration changes are PATCH. Track the minor line (1.0) to pick up
calibration improvements without contract breaks; pin the exact version
(1.0.0) if you need byte-stable scoring.