About the Temporal Alignment Track category

(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)

Use the following paragraphs for a longer description, or to establish category guidelines or rules:

  • Why should people use this category? What is it for?

  • How exactly is this different than the other categories we already have?

  • What should topics in this category generally contain?

  • Do we need this category? Can we merge with another category, or subcategory?

Can we use extra data for training?

According the documentation:

“”"
To adapt this dataset for text-conditional sounding video generation, captions for all video clips were automatically generated using LLaVA-Next. These captions are provided along with the video clips.
“”"

But I could not find the captions in the dataset provided.

Sorry for a late reply. Yes, it is allowed to use extra data for training. Please check “15. DATA USE AGREEMENT” in the challenge rules.
AIcrowd | Temporal Alignment Track | Challenge_rules

Thanks for reporting this issue. I have re-uploaded the zip file containing the caption files.

I downloaded the data a few minutes ago. But I still don’t see any caption data. Please, can you check this again?

@kcy4 : Sorry for the confusion, the file has now been correctly updated on the Resources Page.

The updated file name is SVG2024-temporal-track-v1.zip.

Best of luck

@aicrowd_team : Thank you for the update. I confirmed that there are two caption files(flist_train.csv & flist_val.csv).

I submitted my baseline but the progress is stuck in the validation phase. Can you help me check this?