Evalutation error : Unity environment took too long to respond

kwea123 · March 7, 2019, 12:02pm

I have docker locally and have verified that it works perfectly well. However, when I submit the same run.py to the gitlab, I got the following error which says unity environment doesn’t respond.

I add some additional parameters compared to the original file:
env = ObstacleTowerEnv(args.environment_filename, docker_training=args.docker_training, retro=False, realtime_mode=False)

By the way, I use gpu (gpu:true is set in the aicrowd.json), don’t know if that’s the problem.

Here’s the error log:

2019-03-07T11:32:22.733086464Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.
2019-03-07T11:32:52.766988656Z Traceback (most recent call last):
2019-03-07T11:32:52.7670384Z File “run.py”, line 72, in
2019-03-07T11:32:52.767043221Z env = ObstacleTowerEnv(args.environment_filename, docker_training=args.docker_training, retro=False, realtime_mode=False)
2019-03-07T11:32:52.767068055Z File “/srv/conda/lib/python3.6/site-packages/obstacle_tower_env.py”, line 45, in init
2019-03-07T11:32:52.767071047Z timeout_wait=timeout_wait)
2019-03-07T11:32:52.767073489Z File “/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py”, line 69, in init
2019-03-07T11:32:52.767076089Z aca_params = self.send_academy_parameters(rl_init_parameters_in)
2019-03-07T11:32:52.767099058Z File “/srv/conda/lib/python3.6/site-packages/mlagents_envs/environment.py”, line 491, in send_academy_parameters
2019-03-07T11:32:52.76710192Z return self.communicator.initialize(inputs).rl_initialization_output
2019-03-07T11:32:52.767104579Z File “/srv/conda/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py”, line 80, in initialize
2019-03-07T11:32:52.767107093Z “The Unity environment took too long to respond. Make sure that :\n”
2019-03-07T11:32:52.767109735Z mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
2019-03-07T11:32:52.767141981Z The environment does not need user interaction to launch
2019-03-07T11:32:52.76714544Z The Academy and the External Brain(s) are attached to objects in the Scene
2019-03-07T11:32:52.76714784Z The environment and the Python interface have compatible versions.

arthurj · March 7, 2019, 6:08pm

Hi @kwea123

Can you try increasing the timeout_wait parameter when launching the ObstacleTowerEnv?

banjtheman · March 7, 2019, 10:01pm

@arthurj so i increased my timeout wait to 30000, and same result notice time stamps on my logs

2019-03-07T13:25:24.799545869Z root
....
2019-03-07T21:45:37.653813452Z    "The Unity environment took too long to respond. Make sure that :\n"
2019-03-07T21:45:37.653925272Z mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
2019-03-07T21:45:37.653939834Z 	 The environment does not need user interaction to launch
2019-03-07T21:45:37.653945019Z 	 The Academy and the External Brain(s) are attached to objects in the Scene
2019-03-07T21:45:37.653949289Z 	 The environment and the Python interface have compatible versions.
2019-03-07T21:45:37.653953562Z   In call to configurable 'create_otc_environment' (<function create_otc_environment at 0x7f7dfdfcaf28>)
2019-03-07T21:45:37.65395818Z   In call to configurable 'Runner' (<function Runner.__init__ at 0x7f7dfdfca158>)
2019-03-07T21:45:37.653962707Z   In call to configurable 'create_runner' (<function create_runner at 0x7f7dfdfa9f28>)

full logs https://gitlab.aicrowd.com/banjtheman/obstacle-tower-challenge/issues/11

karl · March 7, 2019, 11:34pm

same issue here

"The Unity environment took too long to respond. Make sure that :\n"
mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
   The environment does not need user interaction to launch
   The Academy and the External Brain(s) are attached to objects in the Scene
   The environment and the Python interface have compatible versions.

using osx with screen, no poup window seen. (same on ubuntu)
Thanks in advance

shivam · March 7, 2019, 11:41pm

We have rolled out a stability fix now and hoping this problem to be resolved completely.
Please let us know if this still pops up so we can investigate further accordingly.

@kwea123 Looks like your latest submission is having different issue then the mlagents_envs.exception.UnityTimeOutException. But if it pops up again please let us know by replying to this thread.

@banjtheman I re-evaluated the submission shared by you, and it went well w.r.t. mlagents_envs.exception.UnityTimeOutException after the fix, although failed on some other issue, you can view the same on link now.

shivam · March 7, 2019, 11:43pm

Can you share your submission link?

wywarren · March 8, 2019, 8:47am

https://gitlab.aicrowd.com/wywarren/obstacle-tower-challenge/issues/6

kwea123 · March 8, 2019, 1:03pm

Hi @shivam can you check mine? https://gitlab.aicrowd.com/kwea123/obstacle-tower-challenge-submission-kwea123/issues/4

I submitted some test versions to disable the gpu to see if it’s the problem of GPU, so my latest submission is irrelevant.

ChenKuanSun · March 8, 2019, 2:05pm

https://gitlab.aicrowd.com/ChenKuanSun/obg/issues/1

If the evaluation system is in the same environment, will it be because the worker_id is not set and the startup fails? When others are evaluation?

banjtheman · March 12, 2019, 2:42am

@shivam seems to be happening again

2019-03-12T02:22:06.918029502Z root
2019-03-12T02:22:17.32389638Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

https://gitlab.aicrowd.com/banjtheman/obstacle-tower-challenge/issues/9

The ~10 second startup time seems to be the issue as noted here Starter kit stuck "pending" state for a day

Is it possible to delay environment container say 60 seconds after agent container starts?

mohanty · March 12, 2019, 8:28am

@banjtheman: Wasnt a timeout parameter added to the env instantiation in v1.2 ?

banjtheman · March 12, 2019, 1:07pm

@mohanty yea, I’ve increased mine to 600 (even 30000 once) but all that does is keep the agent container idle, it looks like if the environment container starts before the agent displays

 INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

The test never starts, and this is easily reproducible on a local environment

banjtheman · March 13, 2019, 3:03pm

Was able to get around this by using the defer import strategy so the mlagent text came 1 second after startup, bit hacky but seems to be only way to get test to run.

2019-03-13T14:16:19.307396414Z root
2019-03-13T14:16:20.443927313Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

mohanty · March 13, 2019, 4:29pm

Please do send in a pull request to the official docs. Many participants seem to be having the same issue.

huixxi · March 17, 2019, 2:51pm

I run the offical tutorial code on gcp, I have changed the timeout_wait to 6000000, but it still raise the same error, so how you fix the problem?

mohanty · March 17, 2019, 8:25pm

Are you doing the env instantiation before or after the loading of your model ?

huixxi · March 18, 2019, 1:10am

I just put the ObstacleTower folder which includes the obstacle.x86_64 file in the proper directory. Then I directly run the train.py as the tutorial says. Do I need to start the obstacle.x86_64 file manually before running the train.py?

huixxi · March 18, 2019, 4:35am

And I have tested the env in local, it worked well. But once I run the code on Google Colab, the error comes.

Petero · March 22, 2019, 9:56am

hi,I have same problem.

and I use the AWS.ec2.

UnityTimeOutException Traceback (most recent call last)
in ()
----> 1 env = ObstacleTowerEnv(’/home/ubuntu/ObstacleTower/obstacletower’, retro=True)

~/anaconda3/lib/python3.6/site-packages/obstacle_tower_env.py in init(self, environment_filename, docker_training, worker_id, retro, timeout_wait, realtime_mode)

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/environment.py in init(self, file_name, worker_id, base_port, seed, docker_training, no_graphics, timeout_wait)
67 )
68 try:
—> 69 aca_params = self.send_academy_parameters(rl_init_parameters_in)
70 except UnityTimeOutException:
71 self._close()

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/environment.py in send_academy_parameters(self, init_parameters)
489 inputs = UnityInput()
490 inputs.rl_initialization_input.CopyFrom(init_parameters)
–> 491 return self.communicator.initialize(inputs).rl_initialization_output
492
493 def wrap_unity_input(self, rl_input: UnityRLInput) -> UnityOutput:

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py in initialize(self, inputs)
78 if not self.unity_to_external.parent_conn.poll(self.timeout_wait):
79 raise UnityTimeOutException(
—> 80 “The Unity environment took too long to respond. Make sure that :\n”
81 “\t The environment does not need user interaction to launch\n”
82 “\t The Academy and the External Brain(s) are attached to objects in the Scene\n”

UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
The environment does not need user interaction to launch
The Academy and the External Brain(s) are attached to objects in the Scene
The environment and the Python interface have compatible versions.

Petero · March 23, 2019, 3:24am

GCP tensorflow-gpu==1.13.0 gpu nvidia p100 still have same problem.