About Test Set Leakage in Round 1

jinwei_luo · May 16, 2024, 8:37am

Hello everyone,

I’ve noticed an issue regarding the previous competition round. It appears that logs from the inference code are accessible, resulting in significant data leakage. This means that all query and search results can be printed. Consequently, I’d like to inquire whether the organizers plan to release the test set for Round 1 to prevent some participants from gaining additional dataset insights. Moreover, this dataset could potentially share the same distribution as future test sets.

liberifatali · May 17, 2024, 3:20pm

@aicrowd_team I suggest that the Round 2 test set should be truly private and not share any similarity or distribution with the data in Round 1.

jiazunchen · May 17, 2024, 4:31pm

In fact, the test set for round1 is the data set given to us, so there is no leakage problem

normanbai · May 20, 2024, 7:36am

@aicrowd_team
It has been brought to our notice that certain teams have gained unfair access to the test dataset, giving them an advantage in the competition. This not only undermines the interests of all participating teams but also raises concerns about your professionalism and authority.

We request that you take immediate action to rectify this situation and eliminate any unfair advantages gained by these teams. Failure to address this issue adequately will lead us and other teams to raise awareness about your mistakes and concerns regarding the professionalism and integrity of the competition.

We trust that you understand the gravity of this situation and the potential consequences it may have on the reputation and future of the KDD RAG Competition. We urge you to take prompt and decisive action to restore trust among participants and uphold the principles of fairness and integrity.

We look forward to your prompt response and the necessary actions taken to address this issue effectively.