Evaluation note
Confidence vs accuracy correlation
0.73pearson r•ACL v1.0•2025-11-24
Dataset / task: Cognitive MMLU (27 scenario-based questions, human graded)
Protocol: Correlation between self-reported confidence and correctness after ACL processing.
Sample size: n=27
System version: ACL v1.0
Measured: 2025-11-24
Limitations
- Small sample; expansion in progress
- Correlation measured on short-form answers; long-form not covered
Sources
TR-2025-25