The general limitation on resources (time and compute) for training the submissions on evaluation servers are as follows
- The upper limit for training time is 8hrs.
- The compute is restricted to 2 CPU cores, 1 gpu(Tesla K80) and 8gb of ram.
If there are questions regarding this or have some special requests then kindly comment below.
Do we have any restrictions on the evaluation time?
I tried to reproduce your evaluation pipeline on my own kubernetes pod with exactly the same resources and evaluation takes me about 2-3 hours per metric.
It seems perfectly sane to have evaluation as a separate job after training, but would it cause me any problems an later stages if it the whole evaluation happens to take me >10 hours?
@rauf_kurbanov: The total timelimit for the whole training + evaluation is now 8 hours. This could potentially be increased. But have to check in with the rest of the team.
Please make sure that only the “training and evaluation” time is counted towards the 8 hours, and not the following:
- build time,
- time waited in queues to initialize either training or evlauation