Really interesting competition here. Joining the competition later and being less versed in the subject area I’m trying to set reasonable objectives of what I can achieve.
I’ve started by trying to reproduce the X-UMX baseline and have set it running with the default parameters. Using a single 2080ti I’m managing ~11 epochs / hour. The default parameters run for 1000 hours and effectively no early stopping (1000 epoch early stopping).
Are these numbers representative of the task? If I’m looking at ~3.8 days per model I’m going to need to be very careful in my training runs.
I’m also noticing a significant difference between training and validation loss. I think may be to the discrepancy in duration (6s train, 80s valid) but not sure. It’d be really helpful to see what a “good training run” looks like on this problem.