Training Times

Really interesting competition here. Joining the competition later and being less versed in the subject area I’m trying to set reasonable objectives of what I can achieve.

I’ve started by trying to reproduce the X-UMX baseline and have set it running with the default parameters. Using a single 2080ti I’m managing ~11 epochs / hour. The default parameters run for 1000 hours and effectively no early stopping (1000 epoch early stopping).

Are these numbers representative of the task? If I’m looking at ~3.8 days per model I’m going to need to be very careful in my training runs.

I’m also noticing a significant difference between training and validation loss. I think may be to the discrepancy in duration (6s train, 80s valid) but not sure. It’d be really helpful to see what a “good training run” looks like on this problem.

Hello @errorfixrepeat,

for X-UMX, I do not have the numbers here but for UMX, you can find the training curves as well as the time per epoch in this file: training.md.

Kind regards

Stefan

Thanks Stefan, that’s definitely interesting. The different loss functions between UMX and X-UMX make a direct comparison challenging, although my 325 compared to their 80 seconds per epoch is a big difference.
If anyone else has any thoughts I’d certainly welcome them!

You should expect a longer training time for X-UMX as you train all four models together (they are combined into one larger model due to the bridging operations).

1 Like

Beyond the 4 models being combined (the 2080 Ti is a fast card so it should handle it), I found that X-UMX includes SDR loss (this version: https://github.com/JeffreyCA/spleeterweb-xumx)

The SDR loss/time domain loss includes the iSTFT operation, and I think this really increases the gradient computation step.

1 Like