Really interesting competition here. Joining the competition later and being less versed in the subject area I’m trying to set reasonable objectives of what I can achieve.
I’ve started by trying to reproduce the X-UMX baseline and have set it running with the default parameters. Using a single 2080ti I’m managing ~11 epochs / hour. The default parameters run for 1000 hours and effectively no early stopping (1000 epoch early stopping).
Are these numbers representative of the task? If I’m looking at ~3.8 days per model I’m going to need to be very careful in my training runs.
I’m also noticing a significant difference between training and validation loss. I think may be to the discrepancy in duration (6s train, 80s valid) but not sure. It’d be really helpful to see what a “good training run” looks like on this problem.
Thanks Stefan, that’s definitely interesting. The different loss functions between UMX and X-UMX make a direct comparison challenging, although my 325 compared to their 80 seconds per epoch is a big difference.
If anyone else has any thoughts I’d certainly welcome them!
You should expect a longer training time for X-UMX as you train all four models together (they are combined into one larger model due to the bridging operations).
Beyond the 4 models being combined (the 2080 Ti is a fast card so it should handle it), I found that X-UMX includes SDR loss (this version: https://github.com/JeffreyCA/spleeterweb-xumx)
The SDR loss/time domain loss includes the iSTFT operation, and I think this really increases the gradient computation step.
I was using the Asteroid version of X-UMX. I tried a number of different things such as test time augmentation and retraining from scratch with more augmentation but wasn’t able to significantly improve my scores.
I could successfully run 4x inferences within the time limits.