Solutions for MDX Leaderboard A (2nd place), B (3rd place) and CDX Leaderboard A (3rd place)

Dear organizers and all participants,

Thank you for a wonderful and competitive challenge this year!

We release the training codes for all the leaderboards at the following repository:


Submission codes are linked in the training repository.

A brief summary of the solutions:

MDX Leaderboard A:

  • Train two models
  • Use the DWT-Transformer-UNet model trained above to score the Labelnoise dataset. The idea is if the model separates well and the stem is clean, SDR should be high.
  • Following this filtered all stems with SDR above 9dB and manually verified a subset of the stems, removing some obvious noisy ones.
  • After this trained a set of lightweight BSRNN models on the filtered subset.
  • Final submission is a per source weighted blending of all 3 model outputs.
  • BSRNN trained with the filtered subset gave a significant boost to the vocals stem, but for other stems the impact was not too significant, and the noise robust training worked quite well.

MDX Leaderboard B:

CDX Leaderboard A:

  • Preprocess the dataset:
    • Remove silences from dialog and music stems and recombine segments with cross-fading wherever possible.
    • Left the Effect stem as it is.
  • Train two models
  • BSRNN dialog outputs sounded really good (and much higher validation score) but performs poorly on the Leaderboard, so we added the scaled residual to only the dialog output to get a decent score.
  • Final submission is a weighted blending of these two model outputs.

Best regards,

Nabarun Goswami (subatomicseer)
Harada-Osa-Mukuta-Kurose Lab
The University of Tokyo