@mohanty - Have you been able to update v2.2? I resubmitted my agent but got errors:
@joe_booth - assuming you’ve updated to the latest binary, and also the latest version of obstacle-tower-env, there may be an issue on our end with the build. I will look into it. Just as a test, can you try running the evaluation locally?
@arthurj: confirming that not a single evaluation has passed after the update to the binary. The evaluations timeout in all the cases with the good old :
2019-06-11T04:51:23.840977159Z Traceback (most recent call last): 2019-06-11T04:51:23.841019586Z File "run.py", line 122, in <module> 2019-06-11T04:51:23.841025795Z env = ObstacleTowerEnv(args.environment_filename, docker_training=args.docker_training, retro=False) 2019-06-11T04:51:23.841030215Z File "/srv/conda/lib/python3.6/site-packages/obstacle_tower_env.py", line 45, in __init__ 2019-06-11T04:51:23.841034598Z timeout_wait=timeout_wait) 2019-06-11T04:51:23.841038452Z File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py", line 69, in __init__ 2019-06-11T04:51:23.841042766Z aca_params = self.send_academy_parameters(rl_init_parameters_in) 2019-06-11T04:51:23.841046878Z File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py", line 491, in send_academy_parameters 2019-06-11T04:51:23.841051249Z return self.communicator.initialize(inputs).rl_initialization_output 2019-06-11T04:51:23.841054996Z File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py", line 80, in initialize 2019-06-11T04:51:23.841059079Z "The Unity environment took too long to respond. Make sure that :\n" 2019-06-11T04:51:23.841063163Z mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
I tried again - this time i get ‘Unable to orchestrate submission, please contact Administrators.’
@joe_booth: Thats a temporary glitch with the compute cluster. Being fixed as we speak. The issue with the v2.2 binary still persists though : Causing timeout exceptions.
Update: The evaluation has been re-enqueued and is being evaluated right now.
Update: We found a bug in the binary, and I have just pushed a fix. Evaluations seem to be working now.
@mohanty - it failed - it looks like it gets stuck when recording the video and then times out
Same issue here, stuck on generating video. Maybe that feature should simply be removed until it can be implemented reliably.
Agree !! Removing the video generation part right now, until we fix the bug properly. Will post an update here after some tests withing the next hour.
Update: I have a hacky fix (until I get a neater one from @harperj), which might solve the problem by limiting the video sizes. Should again report back within the next hour.
It looks like the video worked and it updated the leaderboard but then it started scoring again.
oh strange - it recorded an average floor of 6, then removed it when it restarted training
I am worrking on it right now. The production cluster has some weird quirks it seems. So I have to debug with your submission on the production cluster.
Will post an update here as soon as I have something. Working on it right now.
Took a bit longer than the initially predicted 1hour.
But looks like evaluations are working again now with the new
v2.2. binary, and the video generation is also working.
(Classic case, when you introduce a new bug to deal with an old bug, to end up creating a feature )
Looks good! It would be great to have the videos fixed so we can see @unixpickle 's agent through level 16!!
What’s the logic behind the videos? Is only the weakest episode recorded?