Many participants have struggled due to the time-out problem.
My team also got frustrated when we encountered inference failed at 100%.
Since some submissions failed at 100% inferences, I think the last track is the longest one.
So why don’t you add an additional phase for filtering time-out submission out with the longest track?
Then, participants will not have to wait for the entire inferences.
It will also reduce the evaluation system’s workload because it does not have to process all the tracks for time-out submissions.
It will be great if organizers reproduce training of winning models from Leaderboard A at the end of the competition. Otherwise, participants can hide usage of extra data.
Reminder Validation phase songs don’t count toward your leaderboard scores.
Tentative:
We wouldn’t release an additional song from private songs in the validation phase.
But will include an additional song from MUSDB18, etc having length ~same as the longest song in private songs.
We have added an extra song during the validation phase with a 03:30 min length.
Your submission runs for this song BUT the separated source isn’t counted towards any of the scores.
Timeouts, if any, should be visible early enough now on.
Participants are not able to remove their submissions from leaderboard. It would be great if it becomes possible. This could be useful for participants who set the external_dataset_used flag incorrectly.
Not sure if my thinking on Leaderboard A vs. Leaderboard B is correct, but should models from leaderboard A supersede models from leaderboard B?
Hypothetically, if say:
Leaderboard A:
Model 1, SDR = 10.0
Model 2, SDR = 9.0
Leaderboard B:
Model 3, SDR = 7.0
Model 4, SDR = 6.0
Because model 1 and 2 have a higher SDR, do they also automatically “win” leaderboard B?
Basically, I can see both scenarios make sense:
Option 1: leaderboard A is strictly “external_dataset_used=False”, leaderboard B is strictly “external_dataset_used=True”
Option 2: leaderboard A is strictly “external_dataset_used=False”, in leaderboard B “external_dataset_used=True” is allowed, but all leaderboard A models are also automatically eligibile
@agent@sevagh Yes, I agree with @agent - the second option should be used and leaderboard B should also include systems which did not use external datasets (they are allowed to use extra data but don’t have to).
From experience, systems that are limited to use MUSDB18 train will not perform as good as systems that are allowed to use more data. Hence, the top systems of leaderboard A will not appear on top of the list of leaderboard B.
There are two leaderboards – one for systems that were solely trained on the training part of MUSDB18HQ (“Leaderboard A”) and one for systems trained on any data (“Leaderboard B”).
Will there be an “open-source reveal day” or something? Presumably after the competition deadline/July 31 where contestants should make their code public?
It could be a real party to have 3 months of hidden work come to light.