Rllib custom env

bob_wei · July 6, 2020, 8:28pm

Hi, i’m wondering if there is any way to add a custom wrapper around the procgen/gym env. I know there is already a wrapper being used but we are not able to access that in submissions. Specifically I want to be able to process the observations before they are sent to model in ‘input_dict’

jyotish · July 7, 2020, 5:43am

Hello @bob_wei

You can use a custom preprocessor for this.

github.com

AIcrowd/neurips2020-procgen-starter-kit/blob/master/preprocessors/custom_preprocessor.py

#!/usr/bin/env python

from ray.rllib.models.preprocessors import Preprocessor


class MyPreprocessorClass(Preprocessor):
    """Custom preprocessing for observations

    Adopted from https://docs.ray.io/en/master/rllib-models.html#custom-preprocessors
    """

    def _init_shape(self, obs_space, options):
        return obs_space.shape  # New shape after preprocessing

    def transform(self, observation):
        # Do your custom stuff
        return observation

bob_wei · July 7, 2020, 7:45pm

Hi @jyotish thanks for the reply. I don’t think I was specific enough, but do you know of a better way to access multiple frames across time? (e.g. for stacking frames). I have tried using an rnn state based approach to do this but inference performance and memory usage is not terrible.

anton_makiievskyi · July 11, 2020, 4:25pm

Is it possible to use my own wrapper during training, but default environment during evaluation?
It would be great!

jyotish · July 11, 2020, 4:58pm

Hello @bob_wei @Mckiev

We added support for using wrappers. Please give it a try. https://github.com/AIcrowd/neurips2020-procgen-starter-kit/tree/master/envs

bob_wei · July 11, 2020, 5:25pm

Thanks @jyotish for the help! In this case, I guess the evaluation wrapper will still be the default one however?

jyotish · July 11, 2020, 5:49pm

Hello @bob_wei

Yes, the base env should be the ProcgenEnvWrapper provided in the starter kit. You can use any gym wrapper on top of this. If you use the env from gym.make instead of ProcgenEnvWrapper, the rollouts will fail.

anton_makiievskyi · July 12, 2020, 4:08am

Just to confirm, I understood correctly, since you provided the FrameStack wrapper as an example: in case the default ProcgenEnvWrapper would be used at evaluation, the evaluation must fail because of the changed observation shape, correct?

jyotish · July 12, 2020, 7:01am

Hello @Mckiev

I’m not sure if I understood that right. We will use the same env for training and rollouts. The requirements from our side are

The base env you use should be the env returned by ProcgenEnvWrapper rather than the one you get from gym.make.
The wrapper that you use should extend gym.Wrapper class (in case you are writing one on your own).

Right way to use custom wrappers:

registry.register_env(
    "my_custom_env",
    lambda config: MyWrapper(ProcgenEnvWrapper(config))
)

Wrong way to use custom wrappers:

registry.register_env(
    "my_custom_env",
    lambda config: MyWrapper(gym.make("procgen:procgen-coinrun-v0", **config))
)

During the evaluation (both training and rollouts), we will use the env with your custom wrapper (if any).

If you have a more complex use case (like you need to pass some custom env variables but they should not be passed to the base env),

def create_my_custom_env(config):
    my_var = config.pop("my_var")
    env = ProcgenEnvWrapper(config)
    env = MyWrapper(env, my_var)
    return env

registry.register_env(
    "my_custom_env", create_my_custom_env
)

I hope this covers what you wanted to know.

anton_makiievskyi · July 12, 2020, 8:36am

Oh, okay, now it’s clear!

Instead I thought you answered “yes” to the following question :

Why training with one wrapper and evaluating with another may be useful, is when for example I want to add random noize to observations during training, but then don’t apply noize during evaluation.

I don’t think current config that is passed to env constructor specifies whether it’s a training or evaluation environment, correct?

tim_whitaker · August 8, 2020, 12:20am

Has anyone found out a way to do what Anton asked? Is it possible for the evaluators to add a field in the “env_config:” part of the yaml configuration that says is_training: true or false? I remember trying some trickery in the warm up round by modifying run.sh but I don’t think it worked and I gave up on that idea.

the_raven_chaser · August 8, 2020, 12:55am

If we use frameskip, does the framework count the number of frames in the right way?

For example, if we use frame_skip=2, the number of interactions between the agent and environment is 8e6/2=4e6 when using only 8M frames. If we use the standard configuration which set timesteps_total=8000000, will this stop correctly?

jyotish · August 8, 2020, 6:32am

Hello @tim_whitaker @anton_makiievskyi

We are passing a rollout flag in env_config. During the training phase we set this to rollout = False and during the rollouts, this will be set to rollout = True.

tim_whitaker · August 9, 2020, 2:39pm

Thanks @jyotish. So can you confirm the config looks like this?

config:
    env_config:
        env_name: coinrun
        num_levels: 0
        start_level: 0
        paint_vel_info: False
        use_generated_assets: False
        distribution_mode: easy
        center_agent: True
        use_sequential_levels: False
        use_backgrounds: True
        restrict_themes: False
        use_monochrome_assets: False
        rollout: False
...

Can you post a quick example of where we could access in an env wrapper? I was trying something like what you posted above, but could not get it to work:

def create_env(config):
    rollout = config.pop("rollout")
    procgen = ProcgenEnvWrapper(config)
    env = MyWrapper(procgen, rollout)
    return env

registry.register_env(
    "my_wrapper", create_env,
)

Ray/rllib appears to call my create_env() function more than once and errors out because the rollout key was popped.

dipam_chakraborty · August 9, 2020, 7:23pm

@tim_whitaker Copy the config instead of popping the value directly, python is implicitly is having a “pass by reference” on the config.

tim_whitaker · August 9, 2020, 7:50pm

Ah of course. That makes sense. Thanks @dipam_chakraborty.

dipam_chakraborty · August 9, 2020, 8:09pm

This kind of points out a sort of loophole though, the env config can be modified from a wrapper (for example setting num_levels or changing other keys), which should be against competition rules. I think the organizers should explicitly mention this in the rules, else we could probably use “paint_vel_info” like the last round, though I don’t want to waste a submission to try it.

jyotish · August 10, 2020, 4:47am

Hello @dipam_chakraborty

Thanks for sharing your thoughts. This is the reason why we want the participants to use ProcgenEnvWrapper. The evaluation env_config shared in the other post will be forced at the base env used in your wrapper. So, it is not possible to override the config that we set using a wrapper.

the_raven_chaser · August 10, 2020, 9:57pm

Hi @jyotish

Could you please answer my above question?

If we use frameskip, does the framework count the number of frames in the right way?
For example, if we use frame_skip=2 , the number of interactions between the agent and environment is 8e6/2=4e6 when using only 8M frames. If we use the standard configuration which set timesteps_total=8000000 , will this stop correctly?

joao_schapke · August 13, 2020, 4:12pm

It probably will not. Rllib counts the steps based on the counter in each worker, which operates on the outmost wrapper of the environment. It’s also pretty easy to go over this number if implementing a custom trainer, so one must take care to not go over it.