๐Ÿ“น Sound Demixing Townhall Recording & Presentation

:wave: Hello,

Last Sunday, we hosted the challenge townhall. :clapper: It features presentations from various challenge organisers.

Stefan Uhlich, Researcher from Sony R&D, presented a baseline walkthrough and explain new approaches for the challenge. Giorgio Fabbro, Engineer from Sony R&D, also joined to answer queries.

Igor Gadelha, Head of Data Science at Moises.ai, presented the datasets for both tracks. Gordon Wichern, Senior Principal Research Scientist at Mitsubishi Electric Research Laboratories (MERL) previewed the current and upcoming rounds.

:video_camera: Watch the townhall over here. Got followup questions? Drop them in the comments below.

:inbox_tray: Download the slide deck over here: [Town Hall] Sound Demixing Challenge 2023.pdf - Google Drive

1 Like

Hi, follow up from the last question at 1:08:50 (I was the one who brought up the question regarding GPU memory).
My 24 gig card still OOM with batch size = 2
I manage to run the training with smaller window size [512,1024,4096], but performance are not on par with original window size [1024,2048, 8192].
Any advice on how to run the original training on this small card ?

Hi @yukiya_valentine ,

you have various options to reduce the memory impact of your model, these are the first two that come to my mind:

  • you could run the training in โ€œmixed precisionโ€, where most of the network parameters and ops on the GPU are run as float16 numbers, so they occupy half of the memory as usual; PyTorch has an automatic way to activate this: Automatic Mixed Precision package - torch.amp โ€” PyTorch 1.13 documentation
  • you could reduce the batch size even more, to 1, and then simulate a larger batch size by accumulating the gradients (this means that you update your parameters every N forward/backward passes, instead of doing it every time)

Also, instead of reducing the window size (which affects the frequency resolution in your data), you could try making the sequence length (over time) during training a bit shorter: this will reduce the overall memory consumption of the network on the GPU, but it usually comes at a drop in performance