Making new music available for demixing research

sevagh · August 22, 2021, 2:59pm

Hello again!
I have a friend who created a project for recording royalty-free music, for use in live streaming (e.g. game streaming like Twitch.tv). The artists are paid but the music is open. Here are the links to the project:

I described the MDX to him, and he’s excited to also share the stem files from his project, free and open for all demixing and music research.

I’ve never created an academic dataset before - what sort of steps should I follow to ensure we do this properly? My first thoughts are to upload a zip file on a static GitHub pages website.

I believe as the project grows, the zip file of this dataset will grow to contain the new stems (and I’ll probably periodically change the zip file to include the new tracks).

Maybe I can call it the “OnAir.Music-stems-v1”, v2, v3, etc.

StefanUhlich · August 23, 2021, 9:57am

Having such a new dataset would be great

@faroit: You have more experience in creating such a dataset. What would you recommend to @sevagh?

sevagh · September 2, 2021, 5:19pm

I went ahead and created the first upload: https://github.com/OnAir-Music/OnAir-Music-Dataset

There is only one track/stem set for now. But there will be more coming!

sevagh · September 9, 2021, 4:49pm

After a slow start, we now have 7 full tracks with stems available in the V2 version of the zip. Going forward new stems will arrive as soon as the tracks are published on Spotify, and we anticipate many more to come.

The GitHub release method with LFS seems to be appropriate for now.

sevagh · September 18, 2021, 12:23pm

I’m about to release 2 new tracks, and I’m now making use of the GitHub discussions feature on the main repo.

Until now I have been releasing the stems in the same way the artists give them to us (not consistent). Here’s a GitHub discussion to talk about potentially choosing a better, more consistent stem format: https://github.com/OnAir-Music/OnAir-Music-Dataset/discussions/3

Examples are the 4 targets of MUSDB18-HQ, etc. Collaborators are welcome! (I can’t decide this on my own)

sevagh · October 14, 2021, 2:39pm

I recently created a Python loader, similar to sigsep-mus-db, for the OnAir dataset: https://github.com/OnAir-Music/onair-py

It maps the custom stems into the 4 musdb targets (drums/bass/vocals/other) as best as I could manage. Some cases were tricky, e.g. LoFi hiphop which has faint ambient vocals, and no singing.