Evaluation

Evaluation notes

Every public metric must point to an evaluation note. Notes include dataset/task definition, protocol, sample size where applicable, system version, and limitations.