Dear organizers and all participants,
Thank you for a wonderful and competitive challenge this year!
We release the training codes for all the leaderboards at the following repository:
Submission codes are linked in the training repository.
A brief summary of the solutions:
MDX Leaderboard A:
- Train two models
- Use the DWT-Transformer-UNet model trained above to score the Labelnoise dataset. The idea is if the model separates well and the stem is clean, SDR should be high.
- Following this filtered all stems with SDR above 9dB and manually verified a subset of the stems, removing some obvious noisy ones.
- After this trained a set of lightweight BSRNN models on the filtered subset.
- Final submission is a per source weighted blending of all 3 model outputs.
- BSRNN trained with the filtered subset gave a significant boost to the vocals stem, but for other stems the impact was not too significant, and the noise robust training worked quite well.
MDX Leaderboard B:
- Train two models
- Final submission is a per source weighted blending of these two model outputs.
CDX Leaderboard A:
- Preprocess the dataset:
- Remove silences from dialog and music stems and recombine segments with cross-fading wherever possible.
- Left the Effect stem as it is.
- Train two models
- DWT-Transformer-UNet with L1 loss
- BSRNN with L1 loss
- BSRNN dialog outputs sounded really good (and much higher validation score) but performs poorly on the Leaderboard, so we added the scaled residual to only the dialog output to get a decent score.
- Final submission is a weighted blending of these two model outputs.
Best regards,
Nabarun Goswami (subatomicseer)
email: nabarungoswami@mi.t.u-tokyo.ac.jp
Harada-Osa-Mukuta-Kurose Lab
The University of Tokyo