MDX Leaderboard C 4th place submission

kimberley_jensen · May 18, 2023, 10:27am

Model Summary

Models
- Ensemble of Demucs and Kuielab mdx-net
Demucs models (all trained by meta)
- 4 x htdemucs_ft
- demucs_mmi
- a1d90b5c
Kuielab mdx-net models (all trained by me on a single 16gb T4 GPU)
- vocals.onnx (trained with batchnorm2d, the vocals model is an onnx file because it was trained before i realized pytorch models were faster and i deleted the checkpoint)
- bass.pt (trained with groupnorm num_groups=4)
- other.pt (trained with groupnorm num_groups=2)
- drums.pt (trained with groupnorm num_groups=2)
Things i did to boost SDR
- modify the input mixtures for each model
  - input mixture for htdemucs_ft vocals = The original mixture
  - input mixture for htdemucs_ft drums = Original mixture - output of htdemucs_ft vocals
  - input mixture for htdemucs_ft bass = Original mixture - output of htdemucs_ft vocals + drums
  - input mixture for bass, drums and other.pt = Original mixture - output of vocals.onnx
- Using original mixture - output of htdemucs_ft vocals + drums + bass as the other stem scored higher than using the actual other stem model.
- Blending the model outputs
  - First i blended the demucs model outputs together, then blended the demucs outputs with the mdx-net model outputs
  - [0.08, 0.08, 0.4, 0.88] was the final blend value between demucs and mdx-net (0 would mean 100% demucs was used and 1 would mean 100% mdx-net) in the order of drums,bass, other and vocals.

For the vocals.onnx model the dataset contains 2000 acapellas and instrumentals

bass.pt was Musdb18 + 200 extra bass and bassless tracks

drums.pt was Musdb18 + 150 extra drums and drumless tracks

other.pt was Musdb18 + 100 extra other and otherless tracks.

Training code for kuielab mdx-net with modified model settings GitHub - KimberleyJensen/mdx-net at mdx23