Data quality issues

x_i_n_g · April 19, 2025, 3:17pm

I manually examined 10 “simple-recognition” examples (in the original order from “single-turn-public”), and found 3 of them have issues:

“a5f78985-9afd-4a54-9c39-b9e815d2701c”: The correct answer should be KATU. Ref: KATU - Wikipedia
“09413189-16c7-4853-b17b-26b96d8a8d59”: The image clearly shows that “Note: One quarter-note is equivalent to one beat.”
“ff23c66f-8c4f-4b7e-946a-d3d6e3c41d1b”: This is an ambigious question. In the context of piano, “first position” means “C position”, and it’s different from “rest position”. However, given the image, “first position” may refer to the position in the first picture?

I’m wondering if such data quality issues could impact the reliability of the validation sets, and even the final ranking if the blind sets have similar issues.

yilun_jin8 · April 24, 2025, 6:00am

This question has been forwarded to the Meta team. Will revert back as soon as we hear more.

Jiaqi · May 14, 2025, 6:14pm

@x_i_n_g thank you for raising the issue! We will fix these in the next data release in the upcoming week.

We are constantly improving the data quality. The final ranking will be a combination of auto-eval and human grading to reduce bias and promote fairness.