💬 Feedback & Suggestions

We are constantly trying to improve this challenge for you and would appreciate any feedback you might have! :raised_hands:

Please reply to this thread with your suggestions and feedback on making the challenge better for you!

  • What have been your major pain points so far?
  • What would you like to see improved?

All The Best!

I submitted my baseline but the progress is stuck in the validation phase. Can you help me check this?

@wangzhiyu918 : This should be resolved with th instructions in this post. Best of luck.

@aicrowd_team Hello, I have submitted in phase 2, but the leaderboard hasn’t updated yet. Could you please to check this?

@wangzhiyu918 : we see your submission has already been evaluated now. Best of luck.

@aicrowd_team Yes, however, it has not shown in the leaderload of phrase2

@aicrowd_team My submission has been evaluated sucessfully, however, it has not shown in the leaderboard of phrase2.

Hi, The issue is now fixed. You should be able to see your submission on the leaderboard now.

@snehananavati Could you check for me in task temporal-alignment-track. I can’t submit my code to gitlab because I got this error (I clicked and accepted all the rules)

sub hash: 6ded30585b5f2672b9c3c07888d7dce4a25d9dc2

@snehananavati Could you check for me in task temporal-alignment-track. I can’t submit my code to gitlab because I got this error (I clicked and accepted all the rules)

Submission failed : You have not qualified for this round. Please review the challenge rules at www.aicrowd.com

Dear, @aicrowd_team.
Thank you for this great competition.
I have two questions.

  1. What is the exact eval score?
  • According to the leaderboard, the eval metric seems like it is av_aglign, which is the same as TA for #280900, but is the same as AV_ALIGN for #280380 for me. Which one is the correct one?
  1. Can we choose two submissions at the end of the competition?

Best regards

Hello, Can you please specify the track for your first query?
Edit: As for your second query, at the end of the competition (subjective evaluation), the best system for each participant is used, even if three systems from the same participant are ranked 1st-3rd.

  1. It’s for [Temporal Alignment Track]
  2. I didn’t understand the meaning of “even if three systems from the same participant”. The competition is in phase 2(stage) now. Do you mean that the best system(submission) for both phase 1 and phase 2 will be used for the final? There was warm up stage, but we can’t see the stage(warm up) result now.

Here’s a response from the organiser:

  1. The answer is av_align. We’d like to refer you to the following challenge rules for details.
    AIcrowd | Sounding Video Generation (SVG) Challenge 2024 | Challenge_rulesThe following explanation is an excerpt from the challenge rule page.

We use the AV-Align as the main metric for ranking and the CAVP score as the secondary metric to break ties. The other four metrics are used to exclude entries that provide low-quality data from the ranking. Specifically, if the score of the submitted model does not exceed the threshold value in any one of these four metrics, the model is excluded from the ranking. The threshold is set as follows: 2.0 for FAD, 900 for FVD, 0.25 for LanguageBind text-audio score, and 0.12 for LanguageBind text-video score.

  1. We might have misunderstood your question, but let us clarify what we meant. As explained in the challenge rules,

The top entries in the final leaderboard will be assessed by human evaluation, and the award winning teams will be selected based only on the results of this subjective evaluation.

We plan to assess top 10 models chosen according to av_align. In the 2nd phase, you can submit multiple systems, However, even if you occupy the leaderboard from the 1st place to 10th in the end, we will assess only the top-1 model of yours and pick out 9 models from other participants so that we can reward as many participants as possible.

Hope this answers your questions.

1 Like

@snehananavati @aicrowd_team Hello, for Track 2, would it be possible to first generate a video using an unconditional video generation model and then synthesize the corresponding audio using a video-to-audio model?