@snehananavati Could you check for me in task temporal-alignment-track. I can’t submit my code to gitlab because I got this error (I clicked and accepted all the rules)
sub hash: 6ded30585b5f2672b9c3c07888d7dce4a25d9dc2
@snehananavati Could you check for me in task temporal-alignment-track. I can’t submit my code to gitlab because I got this error (I clicked and accepted all the rules)
Submission failed : You have not qualified for this round. Please review the challenge rules at www.aicrowd.com
Dear, @aicrowd_team.
Thank you for this great competition.
I have two questions.
What is the exact eval score?
According to the leaderboard, the eval metric seems like it is av_aglign, which is the same as TA for #280900, but is the same as AV_ALIGN for #280380 for me. Which one is the correct one?
Can we choose two submissions at the end of the competition?
Hello, Can you please specify the track for your first query? Edit: As for your second query, at the end of the competition (subjective evaluation), the best system for each participant is used, even if three systems from the same participant are ranked 1st-3rd.
I didn’t understand the meaning of “even if three systems from the same participant”. The competition is in phase 2(stage) now. Do you mean that the best system(submission) for both phase 1 and phase 2 will be used for the final? There was warm up stage, but we can’t see the stage(warm up) result now.
We use the AV-Align as the main metric for ranking and the CAVP score as the secondary metric to break ties. The other four metrics are used to exclude entries that provide low-quality data from the ranking. Specifically, if the score of the submitted model does not exceed the threshold value in any one of these four metrics, the model is excluded from the ranking. The threshold is set as follows: 2.0 for FAD, 900 for FVD, 0.25 for LanguageBind text-audio score, and 0.12 for LanguageBind text-video score.
We might have misunderstood your question, but let us clarify what we meant. As explained in the challenge rules,
The top entries in the final leaderboard will be assessed by human evaluation, and the award winning teams will be selected based only on the results of this subjective evaluation.
We plan to assess top 10 models chosen according to av_align. In the 2nd phase, you can submit multiple systems, However, even if you occupy the leaderboard from the 1st place to 10th in the end, we will assess only the top-1 model of yours and pick out 9 models from other participants so that we can reward as many participants as possible.
@snehananavati@aicrowd_team Hello, for Track 2, would it be possible to first generate a video using an unconditional video generation model and then synthesize the corresponding audio using a video-to-audio model?