I manually examined 10 “simple-recognition” examples (in the original order from “single-turn-public”), and found 3 of them have issues:
- “a5f78985-9afd-4a54-9c39-b9e815d2701c”: The correct answer should be KATU. Ref: KATU - Wikipedia
- “09413189-16c7-4853-b17b-26b96d8a8d59”: The image clearly shows that “Note: One quarter-note is equivalent to one beat.”
- “ff23c66f-8c4f-4b7e-946a-d3d6e3c41d1b”: This is an ambigious question. In the context of piano, “first position” means “C position”, and it’s different from “rest position”. However, given the image, “first position” may refer to the position in the first picture?
I’m wondering if such data quality issues could impact the reliability of the validation sets, and even the final ranking if the blind sets have similar issues.