Training Times

errorfixrepeat · July 17, 2021, 8:12pm

Really interesting competition here. Joining the competition later and being less versed in the subject area I’m trying to set reasonable objectives of what I can achieve.

I’ve started by trying to reproduce the X-UMX baseline and have set it running with the default parameters. Using a single 2080ti I’m managing ~11 epochs / hour. The default parameters run for 1000 hours and effectively no early stopping (1000 epoch early stopping).

Are these numbers representative of the task? If I’m looking at ~3.8 days per model I’m going to need to be very careful in my training runs.

I’m also noticing a significant difference between training and validation loss. I think may be to the discrepancy in duration (6s train, 80s valid) but not sure. It’d be really helpful to see what a “good training run” looks like on this problem.

StefanUhlich · July 18, 2021, 11:25am

Hello @errorfixrepeat,

for X-UMX, I do not have the numbers here but for UMX, you can find the training curves as well as the time per epoch in this file: training.md.

Kind regards

Stefan

errorfixrepeat · July 18, 2021, 12:22pm

Thanks Stefan, that’s definitely interesting. The different loss functions between UMX and X-UMX make a direct comparison challenging, although my 325 compared to their 80 seconds per epoch is a big difference.
If anyone else has any thoughts I’d certainly welcome them!

StefanUhlich · July 18, 2021, 12:24pm

You should expect a longer training time for X-UMX as you train all four models together (they are combined into one larger model due to the bridging operations).

sevagh · July 18, 2021, 12:35pm

Beyond the 4 models being combined (the 2080 Ti is a fast card so it should handle it), I found that X-UMX includes SDR loss (this version: https://github.com/JeffreyCA/spleeterweb-xumx)

The SDR loss/time domain loss includes the iSTFT operation, and I think this really increases the gradient computation step.

sevagh · July 30, 2021, 11:12am

edit Looks like error might be on my end. X-UMX needs some extra NNabla initialization done before the separate() function.

@errorfixrepeat

Since you seem to be working on X-UMX. Were you able to use the NNabla pre-trained X-UMX model to run inference successfully?

Right now I’m just trying to privately run a comparison of pretrained UMX vs. pre-trained X-UMX vs. my own model for this competition.

I have downloaded this pretrained model: https://nnabla.org/pretrained-models/ai-research-code/x-umx/x-umx.h5

I tried both the spleeter copy of the code, and the Sony original:

Also I filed the following issue on the Sony repo: https://github.com/sony/ai-research-code/issues/26

The prediction is all 0s for the X-UMX pretrained model, and I wonder what I’m doing wrong.

errorfixrepeat · July 30, 2021, 1:18pm

I was using the Asteroid version of X-UMX. I tried a number of different things such as test time augmentation and retraining from scratch with more augmentation but wasn’t able to significantly improve my scores.

I could successfully run 4x inferences within the time limits.

sevagh · July 31, 2021, 12:43pm

Oooh, I had no idea they stored some copies of UMX/XUMX: https://github.com/asteroid-team/asteroid/tree/master/egs/musdb18