FAQ: Implementing a custom random agent

jyotish · June 29, 2020, 10:28pm

In this post, we will try to demystify the random agent code provided in the starter kit. The most common question we heard from the participants was the difference between the models and algorithms directories.

Our idea was to put all the models inside the models’ directory. These are the RL policy networks that you will build using convolutional layers, dense layers, and so on. The algorithms that govern the optimization of the RL policies go into the algorithms directory.

Implementing a custom random agent

Now, we will see what the code in the CustomRandomAgent class does [refer]. Our random agent does not learn anything. It returns random actions and collects rewards. First, we need to create the environment.

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py#L21-L23


@override(Trainer)
def _init(self, config, env_creator):
    self.env = env_creator(config["env_config"])

Now that we have the env ready, let’s randomly sample actions and run it till the episode finishes.

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py#L30-L37


obs = self.env.reset()
done = False
reward = 0.0
while not done:
    action = self.env.action_space.sample()
    obs, r, done, info = self.env.step(action)
    reward += r
    steps += 1

When training the agent, we want to run this in a loop for rollouts_per_iteration number of times.

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py#L29-L37


for _ in range(self.config["rollouts_per_iteration"]):
    obs = self.env.reset()
    done = False
    reward = 0.0
    while not done:
        action = self.env.action_space.sample()
        obs, r, done, info = self.env.step(action)
        reward += r
        steps += 1

Now, let’s collect the rewards and return a dict containing training stats for a given iteration.

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py#L25-L42


@override(Trainer)
def _train(self):
    rewards = []
    steps = 0
    for _ in range(self.config["rollouts_per_iteration"]):
        obs = self.env.reset()
        done = False
        reward = 0.0
        while not done:
            action = self.env.action_space.sample()
            obs, r, done, info = self.env.step(action)
            reward += r
            steps += 1
        rewards.append(reward)
    return {
        "episode_reward_mean": np.mean(rewards),
        "timesteps_this_iter": steps,
    }

That’s it! You can find the complete code for this agent at https://github.com/AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/custom_random_agent/custom_random_agent.py

Now, how does rllib know that there is this custom agent that we want to use? We have a custom registry for this. First, list your python class as a custom algorithm here,

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/algorithms/registry.py#L16-L23


def _import_custom_random_agent():
    from .custom_random_agent.custom_random_agent import CustomRandomAgent
    return CustomRandomAgent




CUSTOM_ALGORITHMS = {
    "custom/CustomRandomAgent": _import_custom_random_agent
}

This will register the random agent class with the name custom/CustomRandomAgent. Now we need to add this to our experiments YAML file.

procgen-example:
  env: "procgen_env_wrapper"
  run: "custom/CustomRandomAgent"

So how does rllib know that it has to use the algorithms? We register all the custom algorithms and models in the train.py file!

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/f8b943bffaf2c86a4c78043fcb0f1253ab1b42ba/train.py#L40-L44


load_envs(os.getcwd()) # Load envs
load_models(os.getcwd()) # Load models
# Load custom algorithms
from algorithms import CUSTOM_ALGORITHMS
load_algorithms(CUSTOM_ALGORITHMS)

Other resources on algorithms

Implement a custom loss function while leaving everything else as is.

https://docs.ray.io/en/master/rllib-concepts.html

More examples

alexander_ermolov · June 30, 2020, 8:07am

This instruction does not solve the problem described here. Here are logs. For example, you could create a branch in the repository with a working random agent that can be submitted without modification.