Safety

Safety as architecture

Safety is embedded at every layer of the stack — from routing and input handling to runtime enforcement and post-hoc audits. Claims on this page are evidence-first: any numeric claim links to an Evaluation Note that documents dataset, protocol, sample size, version, date, and limitations.

Principles

  • Evidence-first: every public claim must reference an Evaluation Note that includes methodology and limitations.
  • Least-privilege default: systems default to conservative behaviors for sensitive inputs and require explicit opt-in for higher-risk behaviors.
  • Human-in-the-loop: critical decisions include clear human escalation paths and explainable traceability.
  • Audit-readiness: logs, traces, and versioned artifacts are retained to support institutional review.

Pre-inference

Pre-inference controls and practices that reduce risk before, during, and after inference.

  • Routing and intent classification with risk-aware thresholds
  • Dataset provenance and versioning captured in evaluation notes
  • Opt-in/offline modes for sensitive workloads

Runtime

Runtime controls and practices that reduce risk before, during, and after inference.

  • Hallucination detection at 94% measured accuracy (see evaluation note)
  • Confidence calibration with 0.73 correlation to correctness (see evaluation note)
  • Constraint runtime and policy enforcement in Mavaia

Post-hoc

Post-hoc controls and practices that reduce risk before, during, and after inference.

  • Trace capture via ARTE and ACL pipelines
  • Evaluation notes with limitations for every public metric
  • Audit-ready logs for institutional governance

Evidence & evaluation notes

Key metrics are surfaced here and link to Evaluation Notes that record dataset, protocol, sample sizes, limitations, and source artifacts.

If a metric is important to your evaluation, request the full protocol and reproducibility notes via the contact form.

Governance, audits, and next steps

Institutional deployments follow a governance checklist: threat model, compliance mapping (e.g., HIPAA, FedRAMP), audit log retention policy, and a reviewer-approved Evaluation Note set for any numeric claims.