Oracle baselines

sevagh · July 22, 2021, 7:41pm

Hello - I thought I saw a topic or message about submitting ideal ratio mask or other oracle estimators as a baseline.

It seems like a pretty cool idea. Would this be possible? Such methods would need access to the ground truth of the hidden test set.

errorfixrepeat · July 23, 2021, 8:00am

+1 on this.

It’d be helpful to know if STFT magnitude mask estimation is enough to score highly.

StefanUhlich · July 23, 2021, 10:41am

Hello @sevagh and @errorfixrepeat,

we will put the ideal single-channel Wiener filter (=ideal ratio mask) and ideal multi-channel Wiener filtering for MDXDB21 (hidden dataset) into the paper that describes the challenge.

Here are already the scores for your reference on the 18 songs that are used for round 2:

SDR_song for ideal single-channel Wiener filter (oracle method): 9.688dB
SDR_song for ideal multi-channel Wiener filter (oracle method): 9.895dB

Unfortunately, it is not that easy for me to compute the scores for each instrument individually if we only evaluate on the 18 songs of round 2 - in the paper you will then find also the scores for each instrument individually.

Kind regards

Stefan

sevagh · July 23, 2021, 2:58pm

Very cool. Has the “ideal phase-mix oracle” ever been explored? I made a GitHub issue about this in the Open-Unmix repo once: https://github.com/sigsep/open-unmix-pytorch/issues/83

Use ground truth stem + mix phase to create the “mix-phase oracle” - does that idea have any merit?

StefanUhlich · July 23, 2021, 5:45pm

Yes, this would be also interesting - you are right that the DNNs are estimating the amplitude and by the “ideal phase-mix oracle” we would get an upper baseline for a system that doesn’t do a post-processing (like single-/multi-channel WF) but directly goes back into the time-domain.

StefanUhlich · August 26, 2021, 5:57am

@sevagh @errorfixrepeat Just for your reference, here are the ideal SWF and ideal MWF scores on all 27 songs:

sevagh · August 26, 2021, 2:13pm

Very nice, thanks. On another note, the “mixed-phase” oracle I was talking about above seems to have some mentions in literature as the “noisy phase”:

https://arxiv.org/pdf/1907.01160.pdf -

When estimating a time-frequency (T-F) mask that modifies the
mixture signal magnitude and uses the noisy mixture phase for
resynthesis, the phase-sensitive mask [2] can help compensate
for these noisy phase errors.

https://source-separation.github.io/tutorial/basics/phase.html

For a mask-based source separation approach, a easy and very common way to deal with phase is to just copy the phase from the mixture! The mixture phase is sometimes referred to as the noisy phase. This strategy isn’t perfect, but researchers have discovered that it works surprisingly well, and when things go wrong, it’s usually not the fault of the phase.

Since “noise” seems to come from speech demixing (speech + noise), I still prefer the term “mix-phase” for the music demixing case, since interfering musical instruments are not noise, but music!