Oracle baselines

Hello - I thought I saw a topic or message about submitting ideal ratio mask or other oracle estimators as a baseline.

It seems like a pretty cool idea. Would this be possible? Such methods would need access to the ground truth of the hidden test set.

+1 on this.

It’d be helpful to know if STFT magnitude mask estimation is enough to score highly.

Hello @sevagh and @errorfixrepeat,

we will put the ideal single-channel Wiener filter (=ideal ratio mask) and ideal multi-channel Wiener filtering for MDXDB21 (hidden dataset) into the paper that describes the challenge.

Here are already the scores for your reference on the 18 songs that are used for round 2:

  • SDR_song for ideal single-channel Wiener filter (oracle method): 9.688dB
  • SDR_song for ideal multi-channel Wiener filter (oracle method): 9.895dB

Unfortunately, it is not that easy for me to compute the scores for each instrument individually if we only evaluate on the 18 songs of round 2 - in the paper you will then find also the scores for each instrument individually.

Kind regards


Very cool. Has the “ideal phase-mix oracle” ever been explored? I made a GitHub issue about this in the Open-Unmix repo once:

Use ground truth stem + mix phase to create the “mix-phase oracle” - does that idea have any merit?

Yes, this would be also interesting - you are right that the DNNs are estimating the amplitude and by the “ideal phase-mix oracle” we would get an upper baseline for a system that doesn’t do a post-processing (like single-/multi-channel WF) but directly goes back into the time-domain.

@sevagh @errorfixrepeat Just for your reference, here are the ideal SWF and ideal MWF scores on all 27 songs:


Very nice, thanks. On another note, the “mixed-phase” oracle I was talking about above seems to have some mentions in literature as the “noisy phase”:

When estimating a time-frequency (T-F) mask that modifies the
mixture signal magnitude and uses the noisy mixture phase for
resynthesis, the phase-sensitive mask [2] can help compensate
for these noisy phase errors.

For a mask-based source separation approach, a easy and very common way to deal with phase is to just copy the phase from the mixture! The mixture phase is sometimes referred to as the noisy phase. This strategy isn’t perfect, but researchers have discovered that it works surprisingly well, and when things go wrong, it’s usually not the fault of the phase.

Since “noise” seems to come from speech demixing (speech + noise), I still prefer the term “mix-phase” for the music demixing case, since interfering musical instruments are not noise, but music!