I have a question about the training data for Task A (Label Noise) and Task B (Bleeding).
As I understand it, all of the tasks (Task A, B, C) have the same evaluation data (the test data used in MDX2021) and the same target (to build well-performing music separation models), but different training data.
Considering this, I suppose that if I were a ML engineer at a company and had to build a decent separation model in a month, the first thing I would do is data pre-processing and cleaning. If the data has some noisy labels (as in Task A), I would correct or remove them. If the data has some bleeding (as in Task B), I would also remove it. These are not fancy techniques, and they will take some time, but I think they are the most straightforward way to build well-performing separation models.
I’m curious to know if manual training data manipulation or refinements like these are allowed in the competition.
Please correct me if I have any misunderstandings.
thanks for your question.
Your understanding of the three tasks (A, B, C) is correct: the objective is the same, the evaluation data is the same, but the training data differs.
The reason for having leaderboads A and B is that we want researchers to investigate robust training methods, as issues like label swaps and bleeding happen very often when dealing with (big) training sets and they do impact the performance of the systems.
For this reason, we generally prefer not to choose manual intervention on the data (this would not really scale on big datasets).
Moreover, the trainings sets in this challenge have been designed so that manual intervention is either not possible, or harmful (i.e., you are left with very little data).
Consider that label swaps in our case are not done at the 4-stems level (vocals, bass, drums, other), but are done before grouping the many stems of a song into 4. This means that one individual instrument in the track has been grouped with instruments it does not belong to. As a consequence, manual intervention can only discard a stem, not fix it (hence, we will be left with little usable data).
For bleeding, we simulated the corruption so that it should not be possible to fix it by straightforward manual signal processing methods. Again, we want to push researchers into a direction where they need to train with corrupted data and have to devise a training strategy that manages good convergence nonetheless.
For the reasons above, manual methods should be avoided as much as possible; you can still try to “clean” the data using some automatic method (as those can easily scale with the size of the dataset), but keep in mind that we only allow approaches that use the training data of the respective leaderboard (the corrupted one).
For example, you cannot apply a musical instrument tagger/classifier like YAMNet on the training set, as YAMNet has been trained with other data.
You can also take a look at this other message in the forum, it goes in the same direction:
Feel free to ask here, if you have any other questions.