In this post, we will try to demystify the random agent code provided in the starter kit. The most common question we heard from the participants was the difference between the models and algorithms directories.
Our idea was to put all the models inside the models’ directory. These are the RL policy networks that you will build using convolutional layers, dense layers, and so on. The algorithms that govern the optimization of the RL policies go into the algorithms directory.
Implementing a custom random agent
Now, we will see what the code in the
CustomRandomAgent class does [refer]. Our random agent does not learn anything. It returns random actions and collects rewards. First, we need to create the environment.
Now that we have the env ready, let’s randomly sample actions and run it till the episode finishes.
When training the agent, we want to run this in a loop for
rollouts_per_iteration number of times.
Now, let’s collect the rewards and return a
dict containing training stats for a given iteration.
That’s it! You can find the complete code for this agent at https://github.com/AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py
Now, how does
rllib know that there is this custom agent that we want to use? We have a custom registry for this. First, list your python class as a custom algorithm here,
This will register the random agent class with the name
custom/CustomRandomAgent. Now we need to add this to our experiments YAML file.
procgen-example: env: "procgen_env_wrapper" run: "custom/CustomRandomAgent"
So how does
rllib know that it has to use the algorithms? We register all the custom algorithms and models in the train.py file!