FAQ: Implementing a custom random agent

In this post, we will try to demystify the random agent code provided in the starter kit. The most common question we heard from the participants was the difference between the models and algorithms directories.

Our idea was to put all the models inside the models’ directory. These are the RL policy networks that you will build using convolutional layers, dense layers, and so on. The algorithms that govern the optimization of the RL policies go into the algorithms directory.

Implementing a custom random agent

Now, we will see what the code in the CustomRandomAgent class does [refer]. Our random agent does not learn anything. It returns random actions and collects rewards. First, we need to create the environment.

Now that we have the env ready, let’s randomly sample actions and run it till the episode finishes.

When training the agent, we want to run this in a loop for rollouts_per_iteration number of times.

Now, let’s collect the rewards and return a dict containing training stats for a given iteration.

That’s it! You can find the complete code for this agent at https://github.com/AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py

Now, how does rllib know that there is this custom agent that we want to use? We have a custom registry for this. First, list your python class as a custom algorithm here,

This will register the random agent class with the name custom/CustomRandomAgent. Now we need to add this to our experiments YAML file.

procgen-example:
  env: "procgen_env_wrapper"
  run: "custom/CustomRandomAgent"

So how does rllib know that it has to use the algorithms? We register all the custom algorithms and models in the train.py file!

Other resources on algorithms

Implement a custom loss function while leaving everything else as is.

https://docs.ray.io/en/master/rllib-concepts.html

More examples

2 Likes

This instruction does not solve the problem described here. Here are logs. For example, you could create a branch in the repository with a working random agent that can be submitted without modification.