Hi @felixlaumon,
I have responded on the relevant issue.
We had some issues with the compute cluster used for the evaluation, and this submission fell through the cracks when doing some maintenance.
This will be requeued.
Update : The evaluation has been successfully completed now.
Hi @mohanty, it appears my new submission it’s stuck at pending evaluation. It seems the error message this time was something to do with timeout with Gitlab.
Hi @mohanty I think I have isolated the problem to enabling GPU during evaluation. I made a new submission (issue #10) based off the obstacle-tower-challenge master from GitHub and the only modification is in aicrowd.json having gpu: true.
Is GPU not supported during evaluation or is this a bug? Let me know if you need more details in reproducing this issue.
Hey, yes, we also figure out some issues with the GPU configuration on the cluster, and have since fixed it. and requeued both your submissions which were stuck.
@mohanty That’s good news. But it seems I have used up my quota today just to test out the submission Is it possible to reset my quota for today? Thanks!
@felixlaumon: I pasted the logs. But looks like its the timeout exception again (even if I see that you have a 10 min timeout set in your code). This might need a closer look from @arthurj , @harperj and @anhad
I suspect that you guys might have a race condition when the agent (run.sh) and the environment (env.sh) are launched in docker. Importing RainbowAgent and tensorflow usually a bit of time (like a few seconds) and they might cause env.sh to try to listen to the port first before the env is ready in run.py.
I can replicate this issue locally as I always have to wait for Start training by pressing the Play button in the Unity Editor message to show up in run.py before I launch env.sh. Otherwise the environment will time out.
You can probably replicate this issue by trying to put time.sleep(10) in the very beginning for run.py.
While deferring import works for me for now, it is not ideal. So it will be great if you guys could look into this issue. And please let me know if there is any further information you’d like me to provide.