Evaluation note
Intent classification
94%accuracy•FocusOS v0.1•2025-10-12
Dataset / task: FocusOS intent benchmark (200 prompts across task, recall, execution, safety)
Protocol: Classifier evaluated on top-1 intent correctness with human adjudication; FocusOS routing configuration v0.1.
Sample size: n=200
System version: FocusOS v0.1
Measured: 2025-10-12
Limitations
- English-only prompts; multilingual intents not measured
- Bench covers short commands; long-form intent drift not included
Sources
TR-2025-24