- How do we select our final submissions? (in case we believe private test set is different from public test set)
- What are the time constraints on submission stages? (training, inference)
- it is said that “Note: The scores used to compute the current leaderboard are tentative scores computed on 60% of the test datasets. The final scores, computed on the complete test datasets, will be released at the end of the competition.” so will the final standings be based on 40% or 100% of the test set? Is ti correct that now we see the scores from 60% of 1473 samples?
Hello team =) Could you please respond or tag a person to whom these questions should be addressed?
You do not need to select your final submission. All the submissions made are also evaluated on the private dataset. For the final leaderboard, the best score ( on 100% of the test set) among all your submissions will be considered.
During an evaluation, only the inference part of the code is run. We do not run the training part. For the inference part, the current timeout is 15 minutes.
The final standing will be based on 100% of the test set. For now, the score visible is 60% of 1473 samples.
“For now, the score visible is 60% of 1473 samples.”
Can you confirm that participants will only be scored on the hidden 40% of the test set?
Unfortunately, as it stands, the staff has confirmed multiple times that the final score will be on 100% of the test set, which includes the public 60%.
Funny enough, the original poster of this thread seems to be doing exactly what you are mentioning.
I wish it would be otherwise… One suggestion is to shuffle the test set and re-run our submissions.
Yeah, unfortunately the current rules lead to two things
A) Fine-tuning to the Public LB.
B) Submitting as many models as possible with as much variance as you can get away with while maintaining a relatively high score because ALL of your models will be scored for the Private LB, so all you need is 1 of your 200 submissions to get lucky. Each submission is basically a lottery ticket.
however, I’d say A) can be split into 2 categories.
A2) manually modifying predictions by probing
A2 seems to be the best way to be top 1
If the competition had gone on for another week without any clarification, I might have been tempted to jump on the dark side of the force.
However, A2) has now been rightly addressed in this post.