Intent Classification System

Specification for Mavaia's intent classification achieving 78% accuracy across 12 query categories. Documents lightweight classifier architecture, intent routing, and category definitions.

Report ID

TR-2025-20

Type

System Card

Date

2025-11-03

Version

v1.0.0

Authors

Cognitive Architecture Team

Abstract

We present Mavaia's Intent Classification System, a lightweight classifier that categorizes user queries into 12 distinct intent types with 78% accuracy, enabling appropriate routing through the ACL pipeline.

1. Introduction

Effective AI assistance requires understanding not just what users say, but what they want to accomplish. Intent Classification addresses this by categorizing queries into actionable intent types that determine appropriate system routing. Mavaia's Intent Classification System operates as the first component in the ACL pipeline, analyzing query structure, keywords, and context to select from 12 intent categories: Quick Answer, Deep Research, Creative Generation, Code Assistance, Focus Session Management, Memory Retrieval, Emotional Support, Planning/Strategy, Debugging/Analysis, Configuration/Settings, Comparison/Evaluation, and Reflection/Thinking. Classification accuracy directly impacts all downstream processing - correct intent selection ensures queries are routed to appropriate reasoning modules, memory retrieval strategies, and response generation patterns.

2. Methodology

Intent classification uses a hybrid approach combining rule-based patterns and lightweight ML classification. Rule-based patterns detect explicit intent markers: question words (what, why, how), imperative verbs (create, analyze, compare), temporal indicators (today, later, schedule), memory references (remember, last time), and emotional language. The ML classifier uses a small encoder model (80M parameters) trained on 15,000 labeled queries, producing intent probability distributions. Final classification combines rule-based confidence and ML probabilities, requiring 0.65 confidence threshold for confident classification. Queries below threshold are marked ambiguous and routed through conservative default paths. The system analyzes recent conversation history to resolve ambiguous queries using conversational context.

3. Results

Intent classification evaluation across 5,000 queries showed 78% overall accuracy. Per-category performance varied: Quick Answer (91%), Deep Research (85%), Creative Generation (82%), Code Assistance (87%), Focus Session Management (79%), Memory Retrieval (73%), Emotional Support (69%), Planning/Strategy (74%), Debugging/Analysis (81%), Configuration/Settings (88%), Comparison/Evaluation (76%), Reflection/Thinking (62%). Confident classification rate (>0.65 threshold) reached 83%, with 17% ambiguous queries requiring conservative routing. The system maintained <50ms classification latency and <20MB memory footprint, validating the lightweight architecture. Conversation history integration improved ambiguous query resolution by 12 percentage points.

4. Discussion

The Intent Classification System demonstrates that lightweight models can achieve strong intent understanding for AI assistant queries. The 78% overall accuracy proves sufficient for practical routing decisions, particularly given the conservative handling of ambiguous cases. Performance variation across categories reveals inherent difficulty differences - Quick Answer and Configuration queries have clear linguistic markers (91% and 88% accuracy), while Emotional Support and Reflection queries require more subtle interpretation (69% and 62% accuracy). The <50ms latency validates that intent classification doesn't bottleneck the ACL pipeline. The hybrid rule-based and ML approach combines the reliability of explicit patterns with the flexibility of learned representations.

5. Limitations

Current limitations include: (1) Fixed 12 intent categories may not cover all query types users encounter, (2) Multi-intent queries (e.g., 'remember last time and create a similar document') are forced into single category, (3) The 0.65 confidence threshold is manually tuned rather than adaptive, (4) Conversation history integration only considers previous turn rather than full context, (5) The 80M parameter model limits semantic understanding compared to larger encoders, (6) Training data covers English only without multilingual support, (7) Category definitions overlap (Deep Research vs. Comparison/Evaluation have fuzzy boundaries).

6. Conclusion

Mavaia's Intent Classification System provides lightweight query categorization that enables appropriate routing through the ACL pipeline. The 78% accuracy and <50ms latency validate that small models can effectively classify AI assistant queries. Future work will focus on adaptive confidence thresholds, multi-intent support, enhanced conversation context integration, expanded training data coverage, and refined category definitions that reduce overlap while maintaining comprehensive coverage.