What is public test?

tereka · May 27, 2025, 1:17pm

In office hour, I found this content.
What’s a public test? Phase 1 public test?

Release v0.1.2 data and mock APIs
■ Validation and public test

Jiaqi · May 27, 2025, 6:37pm

@tereka
Here is the public test data: https://huggingface.co/datasets/crag-mm-2025/crag-mm-single-turn-public/viewer/default/public_test

The released version is v0.1.2. What was used in phase 1 evaluation / leaderboard is public test v0.1.1. The changes from v0.1.1 to v0.1.2 are covered in the release notes. Hope this is clear.

Chris_Deotte · May 27, 2025, 7:03pm

I am confused. Is v0.1.2 version of crag-mm-single-turn-public dataset split public_test the same questions that are asked when we submit to the phase 2 leaderboard? And if so, will our models be re-evaluated on a private test before the top10 teams are chosen?

Because if LB questions are made available, can’t models memorize the answers to phase 2 leaderboard questions before submission and achieve perfect scores during submission on the leaderboard in phase 2?

(Note: In phase 1, we did not have access to the phase 1 leaderboard questions. And I still can’t find v0.1.1 public test in any of the datasets on HuggingFace).

Jiaqi · May 27, 2025, 7:20pm

@Chris_Deotte

There are 3 data splits (validation, public_test, private) and 2 versions of release.

In phase 1, we released validation v0.1.1 and used public_test v0.1.1 for leaderboard.
In phase 2, validation and public_test v0.1.2 are released. Leaderboard grading was switched to private_test v0.1.2.

And confirming - we did not release v0.1.1 public test.

Does this help to clarify?

Chris_Deotte · May 27, 2025, 7:22pm

Yes, it is clear now. Thank you.

tereka · May 27, 2025, 11:53pm

Thank you! I understand it.