Evaluation note

Confidence vs accuracy correlation

0.73pearson rACL v1.02025-11-24

Dataset / task: Cognitive MMLU (27 scenario-based questions, human graded)

Protocol: Correlation between self-reported confidence and correctness after ACL processing.

Sample size: n=27

System version: ACL v1.0

Measured: 2025-11-24

Limitations

  • Small sample; expansion in progress
  • Correlation measured on short-form answers; long-form not covered

Sources

TR-2025-25
Thynaptic | Cognitive AI Research