Constantly getting agent terminated prematurely error on submission

ross_wightman · March 28, 2019, 5:57pm

I’m getting this error on all of my submissions recently. Code changes very minor from working submissions. Last week for first submissions I actually had this error initially, turned debug flag on with no changes, it worked, turned debug back off and then it worked.

Again today/last night, I’ve submitted multiple times with the failure. I tried enabling the debug flag in json with no code changes, and everything worked. Turned the debug flag back off, doesn’t work.

Any idea what’s happening? The only info in the issue for the submission is the text below:

The following containers terminated prematurely. : agent    
 Please contact administrators, or refer to the execution logs.

jongwook_choi · March 31, 2019, 9:50pm

I’m experiencing the same here, only difference is the debug flag.

ross_wightman · April 1, 2019, 4:51am

I’m experiencing the same here, only difference is the debug flag.

So you experience issue with the debug flag both on and off? Was there anything in the logs.

I have only had the issue trying to submit for scoring with the debug flag disabled. Whenever I try to turn debug on to see what the problem is the submission works fine so I’ve got nothing to go on. I @ mentioned the admins in the Gitlab submission issue but received no response from them.

I’ve tried running with/without GPU but there is no consistent ‘working’ state, it seems random which suggests a timing issue for the evaluation mechanics.

My agent eval code is very simple and shouldn’t have variable timing, it’s based on PyTorch so should be a really fast startup, whereas Tensorflow tends to take more time to init and allocate tensor memory.

mohanty · April 1, 2019, 1:24pm

@ross_wightman: Can you please DM the link to the issue ?

@harperj: Can you please make the necessary changes to the starter kit to include more information about the configurable timeout during the env instantiations ?

xihe · April 1, 2019, 1:45pm

Hi, I also have the same issue, could you check it out? – https://gitlab.aicrowd.com/xihe/obstacle-tower-challenge/issues/13

xihe · April 1, 2019, 2:48pm

I resubmitted and it’s still not working, https://gitlab.aicrowd.com/xihe/obstacle-tower-challenge/issues/17

mohanty · April 1, 2019, 3:05pm

@xihe: I posted a comment here : https://gitlab.aicrowd.com/xihe/obstacle-tower-challenge/issues/17

xihe · April 1, 2019, 3:41pm

Hi again, and thank you for all the help. I’m still getting errors after performing your fix – https://gitlab.aicrowd.com/xihe/obstacle-tower-challenge/issues/18

xihe · April 1, 2019, 4:48pm

Have I used up all my submissions? The bot isn’t creating new issues when I push new tags :s

jongwook_choi · April 1, 2019, 6:21pm

For the debug run with debug: true , it ran successfully with 5 episodes being run. The next submission had debug: false (which is the only difference) with the exactly same code — it failed (no logs can be seen unless it’s a debug mode). I ping the admin but I haven’t heard from them yet.

Just out of curiosity, I made another “identical” submission (with debug: false) and this time it worked. So it must not have been my fault.

ross_wightman · April 1, 2019, 6:35pm

For the debug run with debug: true , it ran successfully with 5 episodes being run. The next submission had debug: false (which is the only difference) with the exactly same code — it failed (no logs can be seen unless it’s a debug mode). I ping the admin but I haven’t heard from them yet.

Just out of curiosity, I made another “identical” submission (with debug: false ) and this time it worked. So it must not have been my fault.

This is exactly what I see, I direct messaged mohanty some of the issues.

mohanty · April 2, 2019, 12:14pm

@ross_wightman : Please refer to my response here : Submissions Q&A