we will put the ideal single-channel Wiener filter (=ideal ratio mask) and ideal multi-channel Wiener filtering for MDXDB21 (hidden dataset) into the paper that describes the challenge.
Here are already the scores for your reference on the 18 songs that are used for round 2:
SDR_song for ideal single-channel Wiener filter (oracle method): 9.688dB
SDR_song for ideal multi-channel Wiener filter (oracle method): 9.895dB
Unfortunately, it is not that easy for me to compute the scores for each instrument individually if we only evaluate on the 18 songs of round 2 - in the paper you will then find also the scores for each instrument individually.
Yes, this would be also interesting - you are right that the DNNs are estimating the amplitude and by the “ideal phase-mix oracle” we would get an upper baseline for a system that doesn’t do a post-processing (like single-/multi-channel WF) but directly goes back into the time-domain.
Very nice, thanks. On another note, the “mixed-phase” oracle I was talking about above seems to have some mentions in literature as the “noisy phase”:
When estimating a time-frequency (T-F) mask that modifies the
mixture signal magnitude and uses the noisy mixture phase for
resynthesis, the phase-sensitive mask [2] can help compensate
for these noisy phase errors.
For a mask-based source separation approach, a easy and very common way to deal with phase is to just copy the phase from the mixture! The mixture phase is sometimes referred to as the noisy phase. This strategy isn’t perfect, but researchers have discovered that it works surprisingly well, and when things go wrong, it’s usually not the fault of the phase.
Since “noise” seems to come from speech demixing (speech + noise), I still prefer the term “mix-phase” for the music demixing case, since interfering musical instruments are not noise, but music!