Not able to connect to the Deepracer environmet

I tried to setup the starter kit in my local machine. I followed the procedure but when i try to run the random_ actions_example.py , I get the following error.

Also here is the output from the simulator

I have also received the following error during starting of the environment . Not sure whether it is related to above issue.

4 Likes

Hi @shardul_gharat

Do you get the random_action_example.py issue even after the simulator docker has started? After the docker output says ====Waiting for gym client=====, the random_action_example.py should be able to connect.

Yes after the docker output saying waiting for gym client I am trying. Its not working.

Here is a screen shot of both terminals. In this screenshot that message is visible

Here is one more error that got this time from the docker.

I had the same issue.
I tried to increase the values zmq.SNDTIMEO and zmq.RCVTIMEO in DeepracerZMQClient.
Actually it helped but I don’t know if it is the reason.

here is my code:

class DeepracerZMQClient:
def init(self, host=“127.0.0.1”, port=8888):
self.host = host
self.port = port
self.socket = zmq.Context().socket(zmq.REQ)
self.socket.set(zmq.SNDTIMEO, 200000)
self.socket.set(zmq.RCVTIMEO, 200000)
self.socket.connect(f"tcp://{self.host}:{self.port}")

3 Likes

I had the same problem. Took me a little while to see what you changed there but your solution works for me. thanks

Thanks @nickhodem your solution worked.

Hi @nickhodem

Thanks for trying this change, however I think it should not have been needed. The timeout values were previously 20 seconds and your settings will set it to 200 seconds. These timeouts are for checking in case the docker goes down. But its good as a temporary thing.

We’ll provide a solution so that you guys don’t need to wait for the docker to start and then start the env.

It appears that there is some startup of the gym (docker) after the random_actions_example tries to connect. In my case, this was taking longer than 20 seconds. I was able to get it to work after making the update to the 200 second timeout.

In the screenshot, the example code is launched, creating the “Agent Ready” line. The example code continues and starts the main loop after the “Reset agent finished” line. This is the piece that I believe was timing out, since it takes longer than 20 seconds on my laptop.

Hi, I am just trying to set this up as well and running into similar issues. Is there an official fix? Also, can someone post what the output should look like if the random agent executes correctly? Thanks.

Here is what I got
image

1 Like

@ngocthachhoang Thanks! This helps alot!

thank you. it works but I have to start docker “source deepracer-gym/start_deepracer_docker.sh” every time before running agent!!

Also, env.render() does not works!! can someone help me with that?

1 Like

Same. After running an agent once, the docker container crashes when I try to run again. Then I can restart the docker and get one more try. This is the output I get


AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   
^[[A{"simapp_exception": {"date": "2021-10-01 13:55:25.742803", "function": "rollout_worker.py::rollout_entry::555", "message": "Rollout worker exited with exception: 'action'", "exceptionType": "simula
tion_worker.exceptions", "eventType": "system_error", "errorCode": "500"}}                                                                                                                                
ERROR: FAULT_CODE: 0                                                                                                                                                                                      
simapp_exit_gracefully: simapp_exit--1                                                                                                                                                                    
Terminating simapp simulation...                                                                                                                                                                          
simapp_exit_gracefully - callstack trace=Traceback (callstack)                                                                                                                                            
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main                                                                                                                                    
    "__main__", mod_spec)                                                                                                                                                                                 
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code                                          
    exec(code, run_globals)                                                                          
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 558, in <module>
    rollout_entry()                                                                                                                                                                                       
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 555, in rollout_entry            
    SIMAPP_EVENT_ERROR_CODE_500)                                                                     
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/log_handler/exception_handler.py", line 74, in log_and_exit
    s3_crash_status_file_name=s3_crash_status_file_name)                                             
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/log_handler/exception_handler.py", line 179, in simapp_exit_gracefully
    callstack_trace = ''.join(traceback.format_stack()) 
simapp_exit_gracefully - exception trace=Traceback (most recent call last):                          
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 538, in rollout_entry                                                                          
    main()                                                                                           
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 532, in main
    unpause_physics=unpause_physics               
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/rollout_worker.py", line 200, in rollout_worker
    graph_manager.act(act_steps, wait_for_full_episodes=graph_manager.agent_params.algorithm.act_for_full_episodes)
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/multi_agent_coach/multi_agent_graph_manager.py", line 438, in act
    done = self.top_level_manager.step(None)                                                         
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/multi_agent_coach/multi_agent_level_manager.py", line 226, in step
    action_infos = [agent.act() for agent in self.agents.values()]                                   
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/multi_agent_coach/multi_agent_level_manager.py", line 226, in <listcomp>
    action_infos = [agent.act() for agent in self.agents.values()]                                                                                                                                        
  File "/opt/amazon/install/sagemaker_rl_agent/lib/python3.6/site-packages/markov/gym_agent.py", line 54, in act                                                                                              action = self._recieved_action['action']                                                                                                                                                              KeyError: 'action'                                                                                                                                                                                                                                                                                                                                                                                                  simapp_exit_gracefully - skipping s3 upload.                                                                                                                                                              simapp_exit_gracefully - Job type is SageOnly. Killing SimApp and Training jobs by PID                                                                                                                    simapp_exit_gracefully - Waiting for simapp and training job to come up.                                                                                                                                  AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                                                                                                                                   AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
simapp_exit_gracefully - Waiting for simapp and training job to come up.                             
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...                                              
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
simapp_exit_gracefully - Waiting for simapp and training job to come up.
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
simapp_exit_gracefully - Waiting for simapp and training job to come up.
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
AgentsVideoEditor._mp4_queue['0'] is empty. Retrying...
simapp_exit_gracefully - Stopped waiting. SimApp Pid Exists=True, Training Pid Exists=False.
+ exit
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 9243 requests (791 known processed) with 0 events remaining.```