Dataset explanation

I need help understanding the datasets, especially the [SiSEC18-MUS-7-WAV]. Would you mind explaining them? Also, where is the test dataset? Only from MUSDB18-7-WAV? Thanks!

Hello @heaven,

this dataset is meant as a preview version of the original MUSDB18 dataset (much smaller in size as there are only 7 seconds of the original audio) and was created with this and this code. It contains 7s from the original songs where there is most activity and which can thus be used as a preview of the original dataset.

Thanks! Will we have more datasets for this challenge?

Hello @heaven, yes - we will provide specific datasets for leaderboard A and B in Track A which participants need to train on. They contain wrongly labeled stems (leaderboard A) or bleeding between stems (leaderboard B). We are currently preparing them and will publish them when the challenge starts :slight_smile: (currently, there is only a warm-up round which allows participants to experiment)

Hi @StefanUhlich , for the corrupted datasets i.e Leaderboard A and B, its mentioned that we can only train on those datasets. Does that mean we cannot use pre-trained models and fine-tune them on the corrupted datasets?

Hello @sandesh_bharadwaj97, welcome to SDX 2023 :slight_smile:

for the corrupted datasets i.e Leaderboard A and B, its mentioned that we can only train on those datasets. Does that mean we cannot use pre-trained models and fine-tune them on the corrupted datasets?

Yes, for MDX Leaderboard A and Leaderboard B, you can only use models that are trained on the provided datasets and you are not allowed to finetune models that were pretrained on other datasets (even if they were trained on datasets like ImageNet from Computer Vision which is another domain than audio).

1 Like

Hi @StefanUhlich , I have a couple of questions about the soon-to-be-released dataset for Bleeding.

  • Will the stems still sum to the original mixture? What I mean is for example, if drums = drums + 0.2*bass, then will the bass be reduced to 0.8*bass or it will still be the original bass?

  • Will the test GT stems also have bleeding or they will be clean?

Hello @subatomicseer,

Will the stems still sum to the original mixture? What I mean is for example, if drums = drums + 0.2*bass, then will the bass be reduced to 0.8*bass or it will still be the original bass?

No, this is not the case - the bleeding comes on top and, hence, it is an additional component for every stem.

Please note that the two new datasets do not contain a “mixture.wav” but only the stems “bass.wav”, “drums.wav”, “other.wav” and “vocals.wav”. From this you could create the mixture yourself by simply summing the four.

Will the test GT stems also have bleeding or they will be clean?

No, they do not have bleeding. Actually we use the same hidden evaluation dataset for all three leaderboards (MDXDB21 which is described here) such that results are also comparable across leaderboards. It will then be interesting to see how much models suffer from having to learn from imperfect data (containing label swaps or bleeding).

Kind regards

Stefan

Hi @StefanUhlich, just following up as the datasets have now been released for MDX Leaderboards A and B. Are we allowed to preprocess the datasets and clean them partially or fully? In case that’s a strategy we want to employ.

Hi @sandesh_bharadwaj97, you can preprocess the datasets and try to employ an automatic method that can reduce the impact for the DNN training, e.g., for the bleeding you could look into methods like

Prätzlich, Thomas, et al. “Kernel additive modeling for interference reduction in multi-channel music recordings.” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015.

We have created the datasets such that it should not be possible to just “clean them up” by manually labeling or doing a simple signal processing based filtering:

  • For the label swap, the label swap happens before mixing to four stems: E.g., if a song contains the five instruments “bass guitar”, “drumset”, “male singer”, “male background choir”, “synthesizer”) then we, e.g., put “male background choir” into “other” and mixed everything to the four stems “bass”, “drums”, “vocals” and “other” - now “other” contains a mixture of “vocals” and “other” and, hence, is a noisy target for training.
  • For the bleeding, we used a set of random effects (reverb, gain, shifts) to create the bleeding and a simple filtering/summation should not be sufficient.

With these two datasets, we want to investigate how good models can become if they are trained on noisy training data (e.g., by modifying loss function, regularize the networks, …).

Please note that you can not use any model which is trained on other datasets than the one that you are allowed to use for leaderboard A or leaderboard B, respectively.

Kind regards
Stefan

2 Likes