Is the new v2.2 used for scoring?


@mohanty - Have you been able to update v2.2? I resubmitted my agent but got errors:


@joe_booth: Yes are using vv2.2r0 for the evaluation. @arthurj can provide more details about any changes you need to make at your end.


Hi @joe_booth and @mohanty

@joe_booth - assuming you’ve updated to the latest binary, and also the latest version of obstacle-tower-env, there may be an issue on our end with the build. I will look into it. Just as a test, can you try running the evaluation locally?


@arthurj: confirming that not a single evaluation has passed after the update to the binary. The evaluations timeout in all the cases with the good old :

2019-06-11T04:51:23.840977159Z Traceback (most recent call last):
2019-06-11T04:51:23.841019586Z   File "", line 122, in <module>
2019-06-11T04:51:23.841025795Z     env = ObstacleTowerEnv(args.environment_filename, docker_training=args.docker_training, retro=False)
2019-06-11T04:51:23.841030215Z   File "/srv/conda/lib/python3.6/site-packages/", line 45, in __init__
2019-06-11T04:51:23.841034598Z     timeout_wait=timeout_wait)
2019-06-11T04:51:23.841038452Z   File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/", line 69, in __init__
2019-06-11T04:51:23.841042766Z     aca_params = self.send_academy_parameters(rl_init_parameters_in)
2019-06-11T04:51:23.841046878Z   File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/", line 491, in send_academy_parameters
2019-06-11T04:51:23.841051249Z     return self.communicator.initialize(inputs).rl_initialization_output
2019-06-11T04:51:23.841054996Z   File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/", line 80, in initialize
2019-06-11T04:51:23.841059079Z     "The Unity environment took too long to respond. Make sure that :\n"
2019-06-11T04:51:23.841063163Z mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :

I tried again - this time i get ‘Unable to orchestrate submission, please contact Administrators.’


@joe_booth: Thats a temporary glitch with the compute cluster. Being fixed as we speak. The issue with the v2.2 binary still persists though : Causing timeout exceptions.

Update: The evaluation has been re-enqueued and is being evaluated right now.


Update: We found a bug in the binary, and I have just pushed a fix. Evaluations seem to be working now.

1 Like

@mohanty - it failed - it looks like it gets stuck when recording the video and then times out


Nudge @arthurj @harperj : The memory leak in the video generation code strikes again :cry: !!


Same issue here, stuck on generating video. Maybe that feature should simply be removed until it can be implemented reliably.


Agree !! Removing the video generation part right now, until we fix the bug properly. Will post an update here after some tests withing the next hour.

Update: I have a hacky fix (until I get a neater one from @harperj), which might solve the problem by limiting the video sizes. Should again report back within the next hour.

1 Like

It looks like the video worked and it updated the leaderboard but then it started scoring again.


oh strange - it recorded an average floor of 6, then removed it when it restarted training


I am worrking on it right now. The production cluster has some weird quirks it seems. So I have to debug with your submission on the production cluster. :angel:
Will post an update here as soon as I have something. Working on it right now.

1 Like

Took a bit longer than the initially predicted 1hour.
But looks like evaluations are working again now with the new v2.2. binary, and the video generation is also working.
(Classic case, when you introduce a new bug to deal with an old bug, to end up creating a feature :wink: )

Have had @joe_booth and @unixpickle 's submissions evaluated. and @tatsuyaogawa 's is evaluating as I write this.
Best of luck.


Looks good! It would be great to have the videos fixed so we can see @unixpickle 's agent through level 16!!


What’s the logic behind the videos? Is only the weakest episode recorded?


@Leckofunny: No, the videos are generated using a separate seed which is not used for the actual evaluation.

1 Like