Not able to connect to the Deepracer environmet

shardul_gharat · September 16, 2021, 5:52am

I tried to setup the starter kit in my local machine. I followed the procedure but when i try to run the random_ actions_example.py , I get the following error.

Also here is the output from the simulator

I have also received the following error during starting of the environment . Not sure whether it is related to above issue.

dipam · September 16, 2021, 7:45am

Hi @shardul_gharat

Do you get the random_action_example.py issue even after the simulator docker has started? After the docker output says ====Waiting for gym client=====, the random_action_example.py should be able to connect.

shardul_gharat · September 16, 2021, 8:15am

Yes after the docker output saying waiting for gym client I am trying. Its not working.

Here is a screen shot of both terminals. In this screenshot that message is visible

shardul_gharat · September 16, 2021, 8:17am

Here is one more error that got this time from the docker.

nickhodem · September 16, 2021, 4:24pm

I had the same issue.
I tried to increase the values zmq.SNDTIMEO and zmq.RCVTIMEO in DeepracerZMQClient.
Actually it helped but I don’t know if it is the reason.

here is my code:

class DeepracerZMQClient:
def init(self, host=“127.0.0.1”, port=8888):
self.host = host
self.port = port
self.socket = zmq.Context().socket(zmq.REQ)
self.socket.set(zmq.SNDTIMEO, 200000)
self.socket.set(zmq.RCVTIMEO, 200000)
self.socket.connect(f"tcp://{self.host}:{self.port}")

DRJJ · September 16, 2021, 7:32pm

I had the same problem. Took me a little while to see what you changed there but your solution works for me. thanks

shardul_gharat · September 16, 2021, 11:07pm

Thanks @nickhodem your solution worked.

dipam · September 17, 2021, 7:06am

Hi @nickhodem

Thanks for trying this change, however I think it should not have been needed. The timeout values were previously 20 seconds and your settings will set it to 200 seconds. These timeouts are for checking in case the docker goes down. But its good as a temporary thing.

We’ll provide a solution so that you guys don’t need to wait for the docker to start and then start the env.

j_langley9 · September 21, 2021, 5:47am

It appears that there is some startup of the gym (docker) after the random_actions_example tries to connect. In my case, this was taking longer than 20 seconds. I was able to get it to work after making the update to the 200 second timeout.

In the screenshot, the example code is launched, creating the “Agent Ready” line. The example code continues and starts the main loop after the “Reset agent finished” line. This is the piece that I believe was timing out, since it takes longer than 20 seconds on my laptop.

jangkj09 · September 27, 2021, 7:32pm

Hi, I am just trying to set this up as well and running into similar issues. Is there an official fix? Also, can someone post what the output should look like if the random agent executes correctly? Thanks.

ngocthachhoang · September 28, 2021, 4:19pm

Here is what I got

jangkj09 · September 29, 2021, 2:05am

@ngocthachhoang Thanks! This helps alot!

azam_kamranian · September 30, 2021, 10:24pm

thank you. it works but I have to start docker “source deepracer-gym/start_deepracer_docker.sh” every time before running agent!!

Also, env.render() does not works!! can someone help me with that?

notnanton · October 1, 2021, 2:02pm

Same. After running an agent once, the docker container crashes when I try to run again. Then I can restart the docker and get one more try. This is the output I get


AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
^[[A{"simapp_exception": {"date": "2021-10-01 13:55:25.742803", "function": "rollout_worker.py::rollout_entry::555", "message": "Rollout worker exited with exception: 'action'", "exceptionType": "simula
tion_worker.exceptions", "eventType": "system_error", "errorCode": "500"}}                                                                                                                                
ERROR: FAULT_CODE: 0                                                                                                                                                                                      
simapp_exit_gracefully: simapp_exit--1                                                                                                                                                                    
Terminating simapp simulation...                                                                                                                                                                          
simapp_exit_gracefully - callstack trace=Traceback (callstack)                                                                                                                                            
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main                                                                                                                                    
    "__main__", mod_spec)                                                                                                                                                                                 
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code                                          
    exec(code, run_globals)                                                                          
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 558, in <module>
    rollout_entry()                                                                                                                                                                                       
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 555, in rollout_entry            
    SIMAPP_EVENT_ERROR_CODE_500)                                                                     
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/log_handler/exception_handler.py", line 74, in log_and_exit
    s3_crash_status_file_name=s3_crash_status_file_name)                                             
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/log_handler/exception_handler.py", line 179, in simapp_exit_gracefully
    callstack_trace = ''.join(traceback.format_stack()) 
simapp_exit_gracefully - exception trace=Traceback (most recent call last):                          
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 538, in rollout_entry                                                                          
    main()                                                                                           
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 532, in main
    unpause_physics=unpause_physics               
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 200, in rollout_worker
    graph_manager.act(act_steps, wait_for_full_episodes=graph_manager.agent_params.algorithm.act_for_full_episodes)
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/multi_agent_coach/multi_agent_graph_manager.py", line 438, in act
    done = self.top_level_manager.step(None)                                                         
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/multi_agent_coach/multi_agent_level_manager.py", line 226, in step
    action_infos = [agent.act() for agent in self.agents.values()]                                   
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/multi_agent_coach/multi_agent_level_manager.py", line 226, in <listcomp>
    action_infos = [agent.act() for agent in self.agents.values()]                                                                                                                                        
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/gym_agent.py", line 54, in act                                                                                              action = self._recieved_action['action']                                                                                                                                                              KeyError: 'action'                                                                                                                                                                                                                                                                                                                                                                                                  simapp_exit_gracefully - skipping s3 upload.                                                                                                                                                              simapp_exit_gracefully - Job type is SageOnly. Killing SimApp and Training jobs by PID                                                                                                                    simapp_exit_gracefully - Waiting for simapp and training job to come up.                                                                                                                                  AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
simapp_exit_gracefully - Waiting for simapp and training job to come up.                             
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
simapp_exit_gracefully - Waiting for simapp and training job to come up.
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
simapp_exit_gracefully - Waiting for simapp and training job to come up.
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
simapp_exit_gracefully - Stopped waiting. SimApp Pid Exists=True, Training Pid Exists=False.
+ exit
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 9243 requests (791 known processed) with 0 events remaining.```

rob_fitzgerald1 · March 19, 2022, 2:33pm

Getting this error…
gym.error.NameNotFound: Environment deepracer_gym:deepracer doesn’t exist.

Does anyone have a fix for this?
Ty
This is the entire error message…usr/lib/python3/dist-packages/apport/report.py:13: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module’s documentation for alternative uses
import fnmatch, glob, traceback, errno, sys, atexit, locale, imp, stat
Traceback (most recent call last):
File “random_actions_example.py”, line 7, in
env = gym.make(‘deepracer_gym:deepracer-v0’)
File “/home/parallels/.local/lib/python3.8/site-packages/gym/envs/registration.py”, line 676, in make
return registry.make(id, **kwargs)
File “/home/parallels/.local/lib/python3.8/site-packages/gym/envs/registration.py”, line 490, in make
versions = self.env_specs.versions(namespace, name)
File “/home/parallels/.local/lib/python3.8/site-packages/gym/envs/registration.py”, line 220, in versions
self._assert_name_exists(namespace, name)
File “/home/parallels/.local/lib/python3.8/site-packages/gym/envs/registration.py”, line 297, in _assert_name_exists
raise error.NameNotFound(message)
gym.error.NameNotFound: Environment deepracer_gym:deepracer doesn’t exist.