Evaluation note

Intent classification

94%accuracyFocusOS v0.12025-10-12

Dataset / task: FocusOS intent benchmark (200 prompts across task, recall, execution, safety)

Protocol: Classifier evaluated on top-1 intent correctness with human adjudication; FocusOS routing configuration v0.1.

Sample size: n=200

System version: FocusOS v0.1

Measured: 2025-10-12

Limitations

  • English-only prompts; multilingual intents not measured
  • Bench covers short commands; long-form intent drift not included

Sources

TR-2025-24
Thynaptic | Cognitive AI Research