Evaluation note

Confidence vs accuracy correlation

0.73pearson r•ACL v1.0•2025-11-24

Dataset / task: Cognitive MMLU (27 scenario-based questions, human graded)

Protocol: Correlation between self-reported confidence and correctness after ACL processing.

Sample size: n=27

System version: ACL v1.0

Measured: 2025-11-24

Limitations

TR-2025-25