Evalutation error : Unity environment took too long to respond

We have rolled out a stability fix now and hoping this problem to be resolved completely.
Please let us know if this still pops up so we can investigate further accordingly.


@kwea123 Looks like your latest submission is having different issue then the mlagents_envs.exception.UnityTimeOutException. But if it pops up again please let us know by replying to this thread.

@banjtheman I re-evaluated the submission shared by you, and it went well w.r.t. mlagents_envs.exception.UnityTimeOutException after the fix, although failed on some other issue, you can view the same on link now.

1 Like

Can you share your submission link?

https://gitlab.aicrowd.com/wywarren/obstacle-tower-challenge/issues/6

Hi @shivam can you check mine? https://gitlab.aicrowd.com/kwea123/obstacle-tower-challenge-submission-kwea123/issues/4

I submitted some test versions to disable the gpu to see if it’s the problem of GPU, so my latest submission is irrelevant.

https://gitlab.aicrowd.com/ChenKuanSun/obg/issues/1

If the evaluation system is in the same environment, will it be because the worker_id is not set and the startup fails? When others are evaluation?

@shivam seems to be happening again

2019-03-12T02:22:06.918029502Z root
2019-03-12T02:22:17.32389638Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

https://gitlab.aicrowd.com/banjtheman/obstacle-tower-challenge/issues/9

The ~10 second startup time seems to be the issue as noted here Starter kit stuck "pending" state for a day

Is it possible to delay environment container say 60 seconds after agent container starts?

1 Like

@banjtheman: Wasnt a timeout parameter added to the env instantiation in v1.2 ?

@mohanty yea, I’ve increased mine to 600 (even 30000 once) but all that does is keep the agent container idle, it looks like if the environment container starts before the agent displays

 INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

The test never starts, and this is easily reproducible on a local environment

Was able to get around this by using the defer import strategy so the mlagent text came 1 second after startup, bit hacky but seems to be only way to get test to run.

2019-03-13T14:16:19.307396414Z root
2019-03-13T14:16:20.443927313Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

Please do send in a pull request to the official docs. Many participants seem to be having the same issue.

I run the offical tutorial code on gcp, I have changed the timeout_wait to 6000000, but it still raise the same error, so how you fix the problem?

Are you doing the env instantiation before or after the loading of your model ?

I just put the ObstacleTower folder which includes the obstacle.x86_64 file in the proper directory. Then I directly run the train.py as the tutorial says. Do I need to start the obstacle.x86_64 file manually before running the train.py?

And I have tested the env in local, it worked well. But once I run the code on Google Colab, the error comes.

hi,I have same problem.

and I use the AWS.ec2.

UnityTimeOutException Traceback (most recent call last)
in ()
----> 1 env = ObstacleTowerEnv(’/home/ubuntu/ObstacleTower/obstacletower’, retro=True)

~/anaconda3/lib/python3.6/site-packages/obstacle_tower_env.py in init(self, environment_filename, docker_training, worker_id, retro, timeout_wait, realtime_mode)

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/environment.py in init(self, file_name, worker_id, base_port, seed, docker_training, no_graphics, timeout_wait)
67 )
68 try:
—> 69 aca_params = self.send_academy_parameters(rl_init_parameters_in)
70 except UnityTimeOutException:
71 self._close()

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/environment.py in send_academy_parameters(self, init_parameters)
489 inputs = UnityInput()
490 inputs.rl_initialization_input.CopyFrom(init_parameters)
–> 491 return self.communicator.initialize(inputs).rl_initialization_output
492
493 def wrap_unity_input(self, rl_input: UnityRLInput) -> UnityOutput:

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py in initialize(self, inputs)
78 if not self.unity_to_external.parent_conn.poll(self.timeout_wait):
79 raise UnityTimeOutException(
—> 80 “The Unity environment took too long to respond. Make sure that :\n”
81 “\t The environment does not need user interaction to launch\n”
82 “\t The Academy and the External Brain(s) are attached to objects in the Scene\n”

UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
The environment does not need user interaction to launch
The Academy and the External Brain(s) are attached to objects in the Scene
The environment and the Python interface have compatible versions.

GCP tensorflow-gpu==1.13.0 gpu nvidia p100 still have same problem.

Finally, When I uninstalled the CUDA 10.0 and installed the 9.0. Problem solved.

In local docker, it always tested well.
I am getting same error for every submission.

The Unity environment took too long to respond.

I think the primary problem with google colab is that you need to run it with xserver, so using xvfb-run or something like that. But after solving that, there is a problem with the opengl version. The version being used is 3.1 rendered by llvmpipe, while unity requires 3.2. Also, it seems that llvmpipe wouldn’t use the GPU for rendering anyway, so I think the environment would run slowly. I don’t really know how to solve that, though.

When I tried to set up xserver on google colab, it raised error : parse_vt_settings: Cannot open /dev/tty0 (No such file or directory)
Have you met the same error and how did you solve it