@snehananavati Could you check for me in task temporal-alignment-track. I can’t submit my code to gitlab because I got this error (I clicked and accepted all the rules)
sub hash: 6ded30585b5f2672b9c3c07888d7dce4a25d9dc2
@snehananavati Could you check for me in task temporal-alignment-track. I can’t submit my code to gitlab because I got this error (I clicked and accepted all the rules)
Submission failed : You have not qualified for this round. Please review the challenge rules at www.aicrowd.com
Dear, @aicrowd_team.
Thank you for this great competition.
I have two questions.
What is the exact eval score?
According to the leaderboard, the eval metric seems like it is av_aglign, which is the same as TA for #280900, but is the same as AV_ALIGN for #280380 for me. Which one is the correct one?
Can we choose two submissions at the end of the competition?
Hello, Can you please specify the track for your first query? Edit: As for your second query, at the end of the competition (subjective evaluation), the best system for each participant is used, even if three systems from the same participant are ranked 1st-3rd.
I didn’t understand the meaning of “even if three systems from the same participant”. The competition is in phase 2(stage) now. Do you mean that the best system(submission) for both phase 1 and phase 2 will be used for the final? There was warm up stage, but we can’t see the stage(warm up) result now.
We use the AV-Align as the main metric for ranking and the CAVP score as the secondary metric to break ties. The other four metrics are used to exclude entries that provide low-quality data from the ranking. Specifically, if the score of the submitted model does not exceed the threshold value in any one of these four metrics, the model is excluded from the ranking. The threshold is set as follows: 2.0 for FAD, 900 for FVD, 0.25 for LanguageBind text-audio score, and 0.12 for LanguageBind text-video score.
We might have misunderstood your question, but let us clarify what we meant. As explained in the challenge rules,
The top entries in the final leaderboard will be assessed by human evaluation, and the award winning teams will be selected based only on the results of this subjective evaluation.
We plan to assess top 10 models chosen according to av_align. In the 2nd phase, you can submit multiple systems, However, even if you occupy the leaderboard from the 1st place to 10th in the end, we will assess only the top-1 model of yours and pick out 9 models from other participants so that we can reward as many participants as possible.
@snehananavati@aicrowd_team Hello, for Track 2, would it be possible to first generate a video using an unconditional video generation model and then synthesize the corresponding audio using a video-to-audio model?
I am writing to kindly inquire about the status of the results for the [Sounding Video Generation (SVG) Challenge 2024], which concluded over a month ago. I understand that preparing and reviewing submissions can be a time-consuming process, and I truly appreciate all the effort that has gone into organizing the event.
Could you please let me know when we can expect the official announcement of the results?
The organisers at Sony are currently conducting the final round of human evaluation. As per the challenge rules, the top entries on the final leaderboard will be assessed through human evaluation, and the winning teams will be selected based on the results of this subjective assessment.
We are awaiting the outcome from the organisers and will share an update as soon as we receive it.