GitHub - KimberleyJensen/kmdx-net_music-source-separation - Submission code with models download
Model Summary
- Models
- Ensemble of Demucs and Kuielab mdx-net
- Demucs models (all trained by meta)
- 4 x htdemucs_ft
- demucs_mmi
- a1d90b5c
- Kuielab mdx-net models (all trained by me on a single 16gb T4 GPU)
- vocals.onnx (trained with batchnorm2d, the vocals model is an onnx file because it was trained before i realized pytorch models were faster and i deleted the checkpoint)
- bass.pt (trained with groupnorm num_groups=4)
- other.pt (trained with groupnorm num_groups=2)
- drums.pt (trained with groupnorm num_groups=2)
- Things i did to boost SDR
- modify the input mixtures for each model
- input mixture for htdemucs_ft vocals = The original mixture
- input mixture for htdemucs_ft drums = Original mixture - output of htdemucs_ft vocals
- input mixture for htdemucs_ft bass = Original mixture - output of htdemucs_ft vocals + drums
- input mixture for bass, drums and other.pt = Original mixture - output of vocals.onnx
- Using original mixture - output of htdemucs_ft vocals + drums + bass as the other stem scored higher than using the actual other stem model.
- Blending the model outputs
- First i blended the demucs model outputs together, then blended the demucs outputs with the mdx-net model outputs
- [0.08, 0.08, 0.4, 0.88] was the final blend value between demucs and mdx-net (0 would mean 100% demucs was used and 1 would mean 100% mdx-net) in the order of drums,bass, other and vocals.
- modify the input mixtures for each model
Dataset details
For the vocals.onnx model the dataset contains 2000 acapellas and instrumentals
bass.pt was Musdb18 + 200 extra bass and bassless tracks
drums.pt was Musdb18 + 150 extra drums and drumless tracks
other.pt was Musdb18 + 100 extra other and otherless tracks.
Training code for kuielab mdx-net with modified model settings GitHub - KimberleyJensen/mdx-net at mdx23