Rllib custom env

Hi, i’m wondering if there is any way to add a custom wrapper around the procgen/gym env. I know there is already a wrapper being used but we are not able to access that in submissions. Specifically I want to be able to process the observations before they are sent to model in ‘input_dict’

2 Likes

Hello @bob_wei

You can use a custom preprocessor for this.

1 Like

Hi @jyotish thanks for the reply. I don’t think I was specific enough, but do you know of a better way to access multiple frames across time? (e.g. for stacking frames). I have tried using an rnn state based approach to do this but inference performance and memory usage is not terrible.

1 Like

Is it possible to use my own wrapper during training, but default environment during evaluation?
It would be great!

Hello @bob_wei @Mckiev

We added support for using wrappers. Please give it a try. https://github.com/AIcrowd/neurips2020-procgen-starter-kit/tree/master/envs

2 Likes

Thanks @jyotish for the help! In this case, I guess the evaluation wrapper will still be the default one however?

Hello @bob_wei

Yes, the base env should be the ProcgenEnvWrapper provided in the starter kit. You can use any gym wrapper on top of this. If you use the env from gym.make instead of ProcgenEnvWrapper, the rollouts will fail.

Just to confirm, I understood correctly, since you provided the FrameStack wrapper as an example: in case the default ProcgenEnvWrapper would be used at evaluation, the evaluation must fail because of the changed observation shape, correct?

Hello @Mckiev

I’m not sure if I understood that right. We will use the same env for training and rollouts. The requirements from our side are

  • The base env you use should be the env returned by ProcgenEnvWrapper rather than the one you get from gym.make.
  • The wrapper that you use should extend gym.Wrapper class (in case you are writing one on your own).

Right way to use custom wrappers:

registry.register_env(
    "my_custom_env",
    lambda config: MyWrapper(ProcgenEnvWrapper(config))
)

Wrong way to use custom wrappers:

registry.register_env(
    "my_custom_env",
    lambda config: MyWrapper(gym.make("procgen:procgen-coinrun-v0", **config))
)

During the evaluation (both training and rollouts), we will use the env with your custom wrapper (if any).

If you have a more complex use case (like you need to pass some custom env variables but they should not be passed to the base env),

def create_my_custom_env(config):
    my_var = config.pop("my_var")
    env = ProcgenEnvWrapper(config)
    env = MyWrapper(env, my_var)
    return env

registry.register_env(
    "my_custom_env", create_my_custom_env
)

I hope this covers what you wanted to know.

1 Like

Oh, okay, now it’s clear!

Instead I thought you answered “yes” to the following question :

Why training with one wrapper and evaluating with another may be useful, is when for example I want to add random noize to observations during training, but then don’t apply noize during evaluation.

I don’t think current config that is passed to env constructor specifies whether it’s a training or evaluation environment, correct?

Has anyone found out a way to do what Anton asked? Is it possible for the evaluators to add a field in the “env_config:” part of the yaml configuration that says is_training: true or false? I remember trying some trickery in the warm up round by modifying run.sh but I don’t think it worked and I gave up on that idea.

If we use frameskip, does the framework count the number of frames in the right way?

For example, if we use frame_skip=2, the number of interactions between the agent and environment is 8e6/2=4e6 when using only 8M frames. If we use the standard configuration which set timesteps_total=8000000, will this stop correctly?

Hello @tim_whitaker @anton_makiievskyi

We are passing a rollout flag in env_config. During the training phase we set this to rollout = False and during the rollouts, this will be set to rollout = True.

2 Likes

Thanks @jyotish. So can you confirm the config looks like this?

config:
    env_config:
        env_name: coinrun
        num_levels: 0
        start_level: 0
        paint_vel_info: False
        use_generated_assets: False
        distribution_mode: easy
        center_agent: True
        use_sequential_levels: False
        use_backgrounds: True
        restrict_themes: False
        use_monochrome_assets: False
        rollout: False
...

Can you post a quick example of where we could access in an env wrapper? I was trying something like what you posted above, but could not get it to work:

def create_env(config):
    rollout = config.pop("rollout")
    procgen = ProcgenEnvWrapper(config)
    env = MyWrapper(procgen, rollout)
    return env

registry.register_env(
    "my_wrapper", create_env,
)

Ray/rllib appears to call my create_env() function more than once and errors out because the rollout key was popped.

@tim_whitaker Copy the config instead of popping the value directly, python is implicitly is having a “pass by reference” on the config.

2 Likes

Ah of course. That makes sense. Thanks @dipam_chakraborty.

This kind of points out a sort of loophole though, the env config can be modified from a wrapper (for example setting num_levels or changing other keys), which should be against competition rules. I think the organizers should explicitly mention this in the rules, else we could probably use “paint_vel_info” like the last round, though I don’t want to waste a submission to try it.

Hello @dipam_chakraborty

Thanks for sharing your thoughts. This is the reason why we want the participants to use ProcgenEnvWrapper. The evaluation env_config shared in the other post will be forced at the base env used in your wrapper. So, it is not possible to override the config that we set using a wrapper.

Hi @jyotish

Could you please answer my above question?

If we use frameskip, does the framework count the number of frames in the right way?
For example, if we use frame_skip=2 , the number of interactions between the agent and environment is 8e6/2=4e6 when using only 8M frames. If we use the standard configuration which set timesteps_total=8000000 , will this stop correctly?

It probably will not. Rllib counts the steps based on the counter in each worker, which operates on the outmost wrapper of the environment. It’s also pretty easy to go over this number if implementing a custom trainer, so one must take care to not go over it.

2 Likes