Hallucination detection
Measured classifier accuracy on internal benchmark
View evaluation noteSafety
Safety is embedded at every layer of the stack — from routing and input handling to runtime enforcement and post-hoc audits. Claims on this page are evidence-first: any numeric claim links to an Evaluation Note that documents dataset, protocol, sample size, version, date, and limitations.
Pre-inference controls and practices that reduce risk before, during, and after inference.
Runtime controls and practices that reduce risk before, during, and after inference.
Post-hoc controls and practices that reduce risk before, during, and after inference.
Key metrics are surfaced here and link to Evaluation Notes that record dataset, protocol, sample sizes, limitations, and source artifacts.
Measured classifier accuracy on internal benchmark
View evaluation noteCorrelation to correctness on held-out tasks
View evaluation noteRepresentative median P95 for on-prem profiles
View evaluation noteIf a metric is important to your evaluation, request the full protocol and reproducibility notes via the contact form.
Institutional deployments follow a governance checklist: threat model, compliance mapping (e.g., HIPAA, FedRAMP), audit log retention policy, and a reviewer-approved Evaluation Note set for any numeric claims.