Evalutation error : Unity environment took too long to respond

shivam · March 7, 2019, 11:43pm

Can you share your submission link?

wywarren · March 8, 2019, 8:47am

https://gitlab.aicrowd.com/wywarren/obstacle-tower-challenge/issues/6

kwea123 · March 8, 2019, 1:03pm

Hi @shivam can you check mine? https://gitlab.aicrowd.com/kwea123/obstacle-tower-challenge-submission-kwea123/issues/4

I submitted some test versions to disable the gpu to see if it’s the problem of GPU, so my latest submission is irrelevant.

ChenKuanSun · March 8, 2019, 2:05pm

https://gitlab.aicrowd.com/ChenKuanSun/obg/issues/1

If the evaluation system is in the same environment, will it be because the worker_id is not set and the startup fails? When others are evaluation?

banjtheman · March 12, 2019, 2:42am

@shivam seems to be happening again

2019-03-12T02:22:06.918029502Z root
2019-03-12T02:22:17.32389638Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

https://gitlab.aicrowd.com/banjtheman/obstacle-tower-challenge/issues/9

The ~10 second startup time seems to be the issue as noted here Starter kit stuck "pending" state for a day

Is it possible to delay environment container say 60 seconds after agent container starts?

mohanty · March 12, 2019, 8:28am

@banjtheman: Wasnt a timeout parameter added to the env instantiation in v1.2 ?

banjtheman · March 12, 2019, 1:07pm

@mohanty yea, I’ve increased mine to 600 (even 30000 once) but all that does is keep the agent container idle, it looks like if the environment container starts before the agent displays

 INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

The test never starts, and this is easily reproducible on a local environment

banjtheman · March 13, 2019, 3:03pm

Was able to get around this by using the defer import strategy so the mlagent text came 1 second after startup, bit hacky but seems to be only way to get test to run.

2019-03-13T14:16:19.307396414Z root
2019-03-13T14:16:20.443927313Z INFO:mlagents_envs:Start training by pressing the Play button in the Unity Editor.

mohanty · March 13, 2019, 4:29pm

Please do send in a pull request to the official docs. Many participants seem to be having the same issue.

huixxi · March 17, 2019, 2:51pm

I run the offical tutorial code on gcp, I have changed the timeout_wait to 6000000, but it still raise the same error, so how you fix the problem?

mohanty · March 17, 2019, 8:25pm

Are you doing the env instantiation before or after the loading of your model ?

huixxi · March 18, 2019, 1:10am

I just put the ObstacleTower folder which includes the obstacle.x86_64 file in the proper directory. Then I directly run the train.py as the tutorial says. Do I need to start the obstacle.x86_64 file manually before running the train.py?

huixxi · March 18, 2019, 4:35am

And I have tested the env in local, it worked well. But once I run the code on Google Colab, the error comes.

Petero · March 22, 2019, 9:56am

hi,I have same problem.

and I use the AWS.ec2.

UnityTimeOutException Traceback (most recent call last)
in ()
----> 1 env = ObstacleTowerEnv(’/home/ubuntu/ObstacleTower/obstacletower’, retro=True)

~/anaconda3/lib/python3.6/site-packages/obstacle_tower_env.py in init(self, environment_filename, docker_training, worker_id, retro, timeout_wait, realtime_mode)

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/environment.py in init(self, file_name, worker_id, base_port, seed, docker_training, no_graphics, timeout_wait)
67 )
68 try:
—> 69 aca_params = self.send_academy_parameters(rl_init_parameters_in)
70 except UnityTimeOutException:
71 self._close()

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/environment.py in send_academy_parameters(self, init_parameters)
489 inputs = UnityInput()
490 inputs.rl_initialization_input.CopyFrom(init_parameters)
–> 491 return self.communicator.initialize(inputs).rl_initialization_output
492
493 def wrap_unity_input(self, rl_input: UnityRLInput) -> UnityOutput:

~/anaconda3/lib/python3.6/site-packages/mlagents_envs/rpc_communicator.py in initialize(self, inputs)
78 if not self.unity_to_external.parent_conn.poll(self.timeout_wait):
79 raise UnityTimeOutException(
—> 80 “The Unity environment took too long to respond. Make sure that :\n”
81 “\t The environment does not need user interaction to launch\n”
82 “\t The Academy and the External Brain(s) are attached to objects in the Scene\n”

UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
The environment does not need user interaction to launch
The Academy and the External Brain(s) are attached to objects in the Scene
The environment and the Python interface have compatible versions.

Petero · March 23, 2019, 3:24am

GCP tensorflow-gpu==1.13.0 gpu nvidia p100 still have same problem.

Petero · March 23, 2019, 1:13pm

Finally, When I uninstalled the CUDA 10.0 and installed the 9.0. Problem solved.

kyunghyunlee · March 24, 2019, 1:04am

In local docker, it always tested well.
I am getting same error for every submission.

The Unity environment took too long to respond.

rafael_mariottini_to · March 25, 2019, 2:07am

I think the primary problem with google colab is that you need to run it with xserver, so using xvfb-run or something like that. But after solving that, there is a problem with the opengl version. The version being used is 3.1 rendered by llvmpipe, while unity requires 3.2. Also, it seems that llvmpipe wouldn’t use the GPU for rendering anyway, so I think the environment would run slowly. I don’t really know how to solve that, though.

huixxi · March 27, 2019, 4:34pm

When I tried to set up xserver on google colab, it raised error : parse_vt_settings: Cannot open /dev/tty0 (No such file or directory)
Have you met the same error and how did you solve it

rafael_mariottini_to · March 28, 2019, 6:12pm

I don’t think you can run xserver. But you can use xvfb-run (!apt-get install xvfb, I think). But then you would get opengl problems anyway. I didn’t get past that.