Technical Reports / TR-2025-03

v1.0.0

Mavaia Cloud Routing

Cloud-only inference performance evaluation using Ollama Cloud API. Documents 94.83% MMLU accuracy with 4,057ms average latency using gpt-oss:20b.

Report ID

TR-2025-03

Type

Technical Analysis

Date

2025-11

Version

v1.0.0

Authors

Infrastructure Team

Abstract

We present Mavaia's Cloud Routing performance evaluation, focusing on cloud-only inference using Ollama Cloud API through the HybridBridgeService. Cloud routing achieves 94.83% accuracy with 4,057ms average latency.

1. Introduction

Mavaia's HybridBridgeService enables cloud model routing when local inference is insufficient or when users explicitly request enhanced capabilities. Cloud routing uses Ollama Cloud API to access larger models (gpt-oss:20b, 20B parameters) versus local models (qwen3:1.7b, granite3.2:2b, 1.7-4B parameters). This analysis evaluates cloud-only performance to establish capability ceiling and latency characteristics when network dependency is acceptable. Cloud routing serves the 3.3% of queries that exceed local model capabilities, enabling Mavaia to maintain quality for complex tasks while preserving privacy for the 96.7% majority handled locally.

2. Methodology

Cloud routing evaluation uses Ollama Cloud API with gpt-oss:20b model across standard benchmarks and production queries. Accuracy measurement: MMLU (Massive Multitask Language Understanding) benchmark testing factual knowledge across 57 subjects, production query evaluation with user-validated quality assessment. Latency measurement: network round-trip time, model inference duration, total end-to-end response time including ACL pipeline processing. Comparison baseline: local-only routing using qwen3:1.7b and granite3.2:2b for identical queries. Reliability assessment: failure rate analysis, network timeout handling, graceful degradation behavior.

3. Results

Cloud routing achieves 94.83% MMLU accuracy versus local models' 68-72% accuracy range, validating 23-27 percentage point capability improvement. Latency characteristics: network round-trip 180-320ms (variable by connectivity), model inference 2,800-3,200ms, total end-to-end 4,057ms average including ACL pipeline. Comparison with local routing: +27% accuracy improvement, +2.2s latency overhead, 100% network dependency versus 0% for local. Reliability metrics: 97.3% successful completion rate with 2.7% timeout/network failures requiring retry or local fallback. Cost implications: $0.02-$0.05 per cloud-routed query versus zero marginal cost for local inference.

4. Discussion

Cloud routing analysis demonstrates clear capability-latency trade-offs. The 27 percentage point accuracy improvement validates that larger cloud models provide genuine benefit for complex queries beyond local model capabilities. The 4.1 second average latency represents meaningful UX degradation versus 1.8 second local routing, but remains acceptable for complex queries where users expect longer processing. The 97.3% reliability indicates network dependency introduces occasional failures that pure local systems avoid. The $0.02-$0.05 per-query cost motivates Mavaia's 96.7% local routing rate - excessive cloud usage would impose unsustainable costs. The analysis validates HybridBridgeService's architecture: attempt local first, escalate to cloud only when necessary, providing best-effort local fallback when cloud unavailable.

5. Limitations

Cloud routing limitations include: (1) Network latency varies by connectivity quality (180-320ms range) affecting consistency, (2) Cloud model selection limited to Ollama Cloud offerings without intelligent provider routing across multiple services, (3) Cost modeling doesn't account for variable pricing across cloud providers, (4) Reliability measurements don't capture degraded network conditions (slow but functional), (5) Accuracy improvement measured on MMLU may not generalize to all query types, (6) The analysis doesn't evaluate privacy implications of cloud routing for sensitive queries.

6. Conclusion

Mavaia's Cloud Routing provides capability enhancement through conditional escalation to larger cloud models when local inference is insufficient. The 94.83% MMLU accuracy versus 68-72% local represents genuine capability gain, while 4.1 second latency and 97.3% reliability demonstrate acceptable performance characteristics. The analysis validates Mavaia's hybrid architecture: prioritize local privacy and speed (96.7% queries), escalate to cloud capability when needed (3.3% queries), providing appropriate performance-privacy-cost trade-offs for different query complexities. Future work will focus on multi-provider cloud routing, reduced latency through predictive escalation, improved offline degradation, and user-controlled privacy policies for cloud routing decisions.

Keywords

CloudRoutingBenchmarks