Rllib custom env

Hi, i’m wondering if there is any way to add a custom wrapper around the procgen/gym env. I know there is already a wrapper being used but we are not able to access that in submissions. Specifically I want to be able to process the observations before they are sent to model in ‘input_dict’

2 Likes

Hello @bob_wei

You can use a custom preprocessor for this.

1 Like

Hi @jyotish thanks for the reply. I don’t think I was specific enough, but do you know of a better way to access multiple frames across time? (e.g. for stacking frames). I have tried using an rnn state based approach to do this but inference performance and memory usage is not terrible.

1 Like

Is it possible to use my own wrapper during training, but default environment during evaluation?
It would be great!

Hello @bob_wei @Mckiev

We added support for using wrappers. Please give it a try. https://github.com/AIcrowd/neurips2020-procgen-starter-kit/tree/master/envs

2 Likes

Thanks @jyotish for the help! In this case, I guess the evaluation wrapper will still be the default one however?

Hello @bob_wei

Yes, the base env should be the ProcgenEnvWrapper provided in the starter kit. You can use any gym wrapper on top of this. If you use the env from gym.make instead of ProcgenEnvWrapper, the rollouts will fail.

Just to confirm, I understood correctly, since you provided the FrameStack wrapper as an example: in case the default ProcgenEnvWrapper would be used at evaluation, the evaluation must fail because of the changed observation shape, correct?

Hello @Mckiev

I’m not sure if I understood that right. We will use the same env for training and rollouts. The requirements from our side are

  • The base env you use should be the env returned by ProcgenEnvWrapper rather than the one you get from gym.make.
  • The wrapper that you use should extend gym.Wrapper class (in case you are writing one on your own).

Right way to use custom wrappers:

registry.register_env(
    "my_custom_env",
    lambda config: MyWrapper(ProcgenEnvWrapper(config))
)

Wrong way to use custom wrappers:

registry.register_env(
    "my_custom_env",
    lambda config: MyWrapper(gym.make("procgen:procgen-coinrun-v0", **config))
)

During the evaluation (both training and rollouts), we will use the env with your custom wrapper (if any).

If you have a more complex use case (like you need to pass some custom env variables but they should not be passed to the base env),

def create_my_custom_env(config):
    my_var = config.pop("my_var")
    env = ProcgenEnvWrapper(config)
    env = MyWrapper(env, my_var)
    return env

registry.register_env(
    "my_custom_env", create_my_custom_env
)

I hope this covers what you wanted to know.

1 Like

Oh, okay, now it’s clear!

Instead I thought you answered “yes” to the following question :

Why training with one wrapper and evaluating with another may be useful, is when for example I want to add random noize to observations during training, but then don’t apply noize during evaluation.

I don’t think current config that is passed to env constructor specifies whether it’s a training or evaluation environment, correct?