Here is a summary of good material to get started with music source separation:
- https://sigsep.github.io: Good starting points with overview of available datasets, tutorials, …
- https://sisec18.unmix.app/: Website with results from SiSEC 2018 (previous iteration of this challenge). The results of the challenge are summarized in this paper.
Papers and Tutorials
- Ethan Manilow, Prem Seetharaman, and Justin Salamon: “Open Source Tools & Data for Music Source Separation” website
- Rafii, Zafar, et al. “An overview of lead and accompaniment separation in music.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.8 (2018): 1307-1335. (PDF)
- Cano, Estefania, et al. “Musical source separation: An introduction.” IEEE Signal Processing Magazine 36.1 (2018): 31-40. (PDF)
Description of the Baselines
- Stöter, Fabian-Robert, et al. “Open-unmix-a reference implementation for music source separation.” Journal of Open Source Software 4.41 (2019): 1667. (PDF)
- Sawata, Ryosuke, et al. “All for One and One for All: Improving Music Separation by Bridging Networks.” ICASSP 2021. (PDF)
OSS Source Separation Tools
In order to see which models perform good on MUSDB18, please have a look at papers-with-code. Here is a (non-exhaustive) list of good OSS models:
- https://github.com/asteroid-team/asteroid : Asteroid
- https://github.com/facebookresearch/demucs : Demucs
- https://github.com/nussl/nussl : NUSSL
- https://github.com/deezer/spleeter : Spleeter
Multitrack datasets for training new models
Leaderboard A (only MUSDB18HQ/MUSDB18 allowed)
- MUSDB18HQ dataset used to train Leaderboard A
- MedleyDB Creative Commons Multi-track files.
- Slakh2100 Synthesized Instrumentals
- DAMP Vocal/Accompaniment
- Mir1k Vocal/Accompaniment
Ideas for improving the baselines
Here are ideas, which we think are worth investigating for improving the baselines:
- Use more data (see, e.g., https://sigsep.github.io/datasets/, http://www.slakh.com/, …)
- Use more data augmentations during training than the traditional ones described here, e.g., pitch-shifting, time-stretching, … as was, e.g., used here.
- Blend several models as described in this paper.
- Use optimized hyper parameters for each instrument