MDX Leaderboard C 4th place submission

GitHub - KimberleyJensen/kmdx-net_music-source-separation - Submission code with models download

Model Summary

  • Models
  • Demucs models (all trained by meta)
    • 4 x htdemucs_ft
    • demucs_mmi
    • a1d90b5c
  • Kuielab mdx-net models (all trained by me on a single 16gb T4 GPU)
    • vocals.onnx (trained with batchnorm2d, the vocals model is an onnx file because it was trained before i realized pytorch models were faster and i deleted the checkpoint)
    • (trained with groupnorm num_groups=4)
    • (trained with groupnorm num_groups=2)
    • (trained with groupnorm num_groups=2)
  • Things i did to boost SDR
    • modify the input mixtures for each model
      • input mixture for htdemucs_ft vocals = The original mixture
      • input mixture for htdemucs_ft drums = Original mixture - output of htdemucs_ft vocals
      • input mixture for htdemucs_ft bass = Original mixture - output of htdemucs_ft vocals + drums
      • input mixture for bass, drums and = Original mixture - output of vocals.onnx
    • Using original mixture - output of htdemucs_ft vocals + drums + bass as the other stem scored higher than using the actual other stem model.
    • Blending the model outputs
      • First i blended the demucs model outputs together, then blended the demucs outputs with the mdx-net model outputs
      • [0.08, 0.08, 0.4, 0.88] was the final blend value between demucs and mdx-net (0 would mean 100% demucs was used and 1 would mean 100% mdx-net) in the order of drums,bass, other and vocals.

Dataset details

For the vocals.onnx model the dataset contains 2000 acapellas and instrumentals was Musdb18 + 200 extra bass and bassless tracks was Musdb18 + 150 extra drums and drumless tracks was Musdb18 + 100 extra other and otherless tracks.

Training code for kuielab mdx-net with modified model settings GitHub - KimberleyJensen/mdx-net at mdx23