I had analyzed the dataset and found the following:
- val dataset contains 9 examples with an incorrect label (189, 198, 866, 876, 1326, 1392, 2198, 2942, 3453).
- train dataset contains 111 examples with an incorrect label(480, 777, 863, …),
2 examples with a “clear board” (39154, 39394) and 5 examples with a “black board”( black pawns are not visible)(15535, 16587, 32283, 34999, 38336).
Maybe the test dataset also has incorrect labels and we can’t get 100% score?