Resource restrictions for training the submissions

waleedgondal · July 5, 2019, 2:27pm

The general limitation on resources (time and compute) for training the submissions on evaluation servers are as follows

The upper limit for training time is 8hrs.
The compute is restricted to 2 CPU cores, 1 gpu(Tesla K80) and 8gb of ram.

If there are questions regarding this or have some special requests then kindly comment below.

rauf_kurbanov · July 22, 2019, 5:43am

@waleedgondal @mohanty
Do we have any restrictions on the evaluation time?

I tried to reproduce your evaluation pipeline on my own kubernetes pod with exactly the same resources and evaluation takes me about 2-3 hours per metric.

It seems perfectly sane to have evaluation as a separate job after training, but would it cause me any problems an later stages if it the whole evaluation happens to take me >10 hours?

mohanty · July 23, 2019, 10:03am

@rauf_kurbanov: The total timelimit for the whole training + evaluation is now 8 hours. This could potentially be increased. But have to check in with the rest of the team.

amirabdi · July 25, 2019, 4:28am

Please make sure that only the “training and evaluation” time is counted towards the 8 hours, and not the following:

build time,
time waited in queues to initialize either training or evlauation

Thanks.