Please note that for the final winner selection, a constraint on the missing/refusal rate will be applied. Solutions with high missing rates may be disqualified, even if other metrics are strong.
What specifically is the high missing rate?
The high missing rate need extra clarification, either as a hard constraint, of fuse into the final metric. As it will significantly impact the strategies of whether to provide a answer, it might be better to extend the competition for 1 or 2 week. Peasonly I do not think It is not a good idea to change the rule at this time point.
Hi Participants,
To provide more context to this important update:
The Missing Rate for a model refers to the rate at which the model refuses to answer a question by saying, “i dont know”. We have noticed that many of the current top solutions have a Missing Rate close to 90%, which is clearly unintended even if encouraged by the evaluation metric used for the current leaderboards. This is also very much against the spirit of the competition.
We would like to remind the participants that the final winners will not be determined by the Round 2 leaderboard scores, but by the result of the final Human Evaluation phase.
The Human Annotators understand the models they are evaluating are supposed to be meaningful QA systems, and we believe that brute optimization of the Missing Rate, will lead to lower scores in the final Human Evaluation phase.
Best,
Meta Organizers
How to submit the final solution? Will we have a chance to submit a final version in the end?
What’s the value of the constraint?
@Jiaqi But only the rank 10 teams got human annotation. While in current leader board, it’s impossible to get in top10 without a high missing rate strategy. If the leaderboard metric remained unchanged, there is no way to decrease the missing rate, which totally doesn’t make sense.
@Jiaqi
Please explain about detail of your evaluation announcement.
-
In Single-Source Augmentation almost team missing score is over 90%.
(e.g. my team have almost 91% missing)
Almost team disqualified in Single-Source Augmentation in now in human eval metrics. is that correct? -
Please tell us the approximate missing rate.
Without a benchmark, I can’t know whether the model I’ve trained is good enough.
@Jiaqi
Please consider stating all your potential requirements clearly, because the competition time is very limited. Every new requirement or rule from the organizers forces us to readjust our pipeline, adapt the dataset, modify the model architecture, retrain the model, and optimize inference in the specific environment. This is very time-consuming.
So, what’s the value of the constraint?
Can we choose our final solution from our submissions? Our Task 1 shows a missing rate over 0.9 on the leaderboard now, but if possible, we’d like to select a submission with a missing rate below 0.9.
So what are the specific standards?
Yes, you will. As with the previous challenges, we will send out a webform, asking you to indicate your final submission(s). It has to be indexed via a valid submission ID though.