UnityTimeOutException in evaluation

#1

I see this stack trace when I try to submit an agent that worked previously:

2019-05-24T17:42:52.825782788Z root
2019-05-24T17:43:05.17870298Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.
2019-05-24T17:43:35.184856349Z Traceback (most recent call last):
2019-05-24T17:43:35.184901058Z   File "run.py", line 56, in <module>
2019-05-24T17:43:35.184908019Z     env = create_single_env(args.environment_filename, docker_training=args.docker_training)
2019-05-24T17:43:35.184929282Z   File "/home/aicrowd/util.py", line 16, in create_single_env
2019-05-24T17:43:35.184932961Z     env = ObstacleTowerEnv(path, **kwargs)
2019-05-24T17:43:35.184935919Z   File "/srv/conda/lib/python3.6/site-packages/obstacle_tower_env.py", line 45, in __init__
2019-05-24T17:43:35.184939382Z     timeout_wait=timeout_wait)
2019-05-24T17:43:35.184942214Z   File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py", line 69, in __init__
2019-05-24T17:43:35.184945802Z     aca_params = self.send_academy_parameters(rl_init_parameters_in)
2019-05-24T17:43:35.184948806Z   File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py", line 491, in send_academy_parameters
2019-05-24T17:43:35.184952019Z     return self.communicator.initialize(inputs).rl_initialization_output
2019-05-24T17:43:35.184954878Z   File "/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py", line 80, in initialize
2019-05-24T17:43:35.184958142Z     "The Unity environment took too long to respond. Make sure that :\n"
2019-05-24T17:43:35.184963164Z mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
2019-05-24T17:43:35.184968225Z 	 The environment does not need user interaction to launch
2019-05-24T17:43:35.184972965Z 	 The Academy and the External Brain(s) are attached to objects in the Scene
2019-05-24T17:43:35.184977708Z 	 The environment and the Python interface have compatible versions.

I am definitely using v2.1 of the environment. Not sure what’s going on, but this happened several submissions in a row, and I see that nobody else has successfully submitted in a few days.

#2

@unixpickle: We are looking into it. The only change that has been done recently was to update the env binary to v1.2, and it also was tested before being deployed. We acknowledge the problem you mention, and are looking into it as we speak.
Hoping to have an update soon on this thread.

#3

@unixpickle: Can you also try a submission where you pass a timeout_wait parameter (https://github.com/Unity-Technologies/obstacle-tower-env/blob/master/obstacle_tower_env.py#L26) during env initialization and set it to something like 900 just to be safe ?

#4

@mohanty looks like your suggestion worked! Submission runs now.

#5

Looping in @arthurj @harperj : We should figure out a way to nicely override the timeout_wait param from the agent size. Maybe we can expose an environment variable that overrides the timeout_wait parameter so that participants wouldnt have to manually set those parameters, and we can adjust them dynamically based on the current setup of the evaluator.

#6

Spoke too soon. Just did another submission and got another timeout after 998s seconds (my timeout was set to 900). Must be non-deterministic.

#7

@mohanty - would you post the logs on my submission here - https://gitlab.aicrowd.com/joe_booth/obstacle-tower-challenge/issues/121 - I’m not sure if it is the same problem. thanks