In office hour, I found this content.
What’s a public test? Phase 1 public test?
Release v0.1.2 data and mock APIs
■ Validation and public test
In office hour, I found this content.
What’s a public test? Phase 1 public test?
Release v0.1.2 data and mock APIs
■ Validation and public test
@tereka
Here is the public test data: https://huggingface.co/datasets/crag-mm-2025/crag-mm-single-turn-public/viewer/default/public_test
The released version is v0.1.2. What was used in phase 1 evaluation / leaderboard is public test v0.1.1. The changes from v0.1.1 to v0.1.2 are covered in the release notes. Hope this is clear.
I am confused. Is v0.1.2
version of crag-mm-single-turn-public
dataset split public_test
the same questions that are asked when we submit to the phase 2 leaderboard? And if so, will our models be re-evaluated on a private test before the top10 teams are chosen?
Because if LB questions are made available, can’t models memorize the answers to phase 2 leaderboard questions before submission and achieve perfect scores during submission on the leaderboard in phase 2?
(Note: In phase 1, we did not have access to the phase 1 leaderboard questions. And I still can’t find v0.1.1
public test in any of the datasets on HuggingFace).
There are 3 data splits (validation, public_test, private) and 2 versions of release.
In phase 1, we released validation v0.1.1 and used public_test v0.1.1 for leaderboard.
In phase 2, validation and public_test v0.1.2 are released. Leaderboard grading was switched to private_test v0.1.2.
And confirming - we did not release v0.1.1 public test.
Does this help to clarify?
Yes, it is clear now. Thank you.
Thank you! I understand it.