Hi, i’m wondering if there is any way to add a custom wrapper around the procgen/gym env. I know there is already a wrapper being used but we are not able to access that in submissions. Specifically I want to be able to process the observations before they are sent to model in ‘input_dict’
Hi @jyotish thanks for the reply. I don’t think I was specific enough, but do you know of a better way to access multiple frames across time? (e.g. for stacking frames). I have tried using an rnn state based approach to do this but inference performance and memory usage is not terrible.
Is it possible to use my own wrapper during training, but default environment during evaluation?
It would be great!
We added support for using wrappers. Please give it a try. https://github.com/AIcrowd/neurips2020-procgen-starter-kit/tree/master/envs
Thanks @jyotish for the help! In this case, I guess the evaluation wrapper will still be the default one however?
Hello @bob_wei
Yes, the base env should be the ProcgenEnvWrapper
provided in the starter kit. You can use any gym wrapper on top of this. If you use the env from gym.make
instead of ProcgenEnvWrapper
, the rollouts will fail.
Just to confirm, I understood correctly, since you provided the FrameStack wrapper as an example: in case the default ProcgenEnvWrapper would be used at evaluation, the evaluation must fail because of the changed observation shape, correct?
Hello @Mckiev
I’m not sure if I understood that right. We will use the same env for training and rollouts. The requirements from our side are
- The base env you use should be the env returned by
ProcgenEnvWrapper
rather than the one you get fromgym.make
. - The wrapper that you use should extend
gym.Wrapper
class (in case you are writing one on your own).
Right way to use custom wrappers:
registry.register_env(
"my_custom_env",
lambda config: MyWrapper(ProcgenEnvWrapper(config))
)
Wrong way to use custom wrappers:
registry.register_env(
"my_custom_env",
lambda config: MyWrapper(gym.make("procgen:procgen-coinrun-v0", **config))
)
During the evaluation (both training and rollouts), we will use the env with your custom wrapper (if any).
If you have a more complex use case (like you need to pass some custom env variables but they should not be passed to the base env),
def create_my_custom_env(config):
my_var = config.pop("my_var")
env = ProcgenEnvWrapper(config)
env = MyWrapper(env, my_var)
return env
registry.register_env(
"my_custom_env", create_my_custom_env
)
I hope this covers what you wanted to know.
Oh, okay, now it’s clear!
Instead I thought you answered “yes” to the following question :
Why training with one wrapper and evaluating with another may be useful, is when for example I want to add random noize to observations during training, but then don’t apply noize during evaluation.
I don’t think current config that is passed to env constructor specifies whether it’s a training or evaluation environment, correct?
Has anyone found out a way to do what Anton asked? Is it possible for the evaluators to add a field in the “env_config:” part of the yaml configuration that says is_training: true or false? I remember trying some trickery in the warm up round by modifying run.sh but I don’t think it worked and I gave up on that idea.
If we use frameskip, does the framework count the number of frames in the right way?
For example, if we use frame_skip=2
, the number of interactions between the agent and environment is 8e6/2=4e6
when using only 8M frames. If we use the standard configuration which set timesteps_total=8000000
, will this stop correctly?
Hello @tim_whitaker @anton_makiievskyi
We are passing a rollout
flag in env_config
. During the training phase we set this to rollout = False
and during the rollouts, this will be set to rollout = True
.
Thanks @jyotish. So can you confirm the config looks like this?
config:
env_config:
env_name: coinrun
num_levels: 0
start_level: 0
paint_vel_info: False
use_generated_assets: False
distribution_mode: easy
center_agent: True
use_sequential_levels: False
use_backgrounds: True
restrict_themes: False
use_monochrome_assets: False
rollout: False
...
Can you post a quick example of where we could access in an env wrapper? I was trying something like what you posted above, but could not get it to work:
def create_env(config):
rollout = config.pop("rollout")
procgen = ProcgenEnvWrapper(config)
env = MyWrapper(procgen, rollout)
return env
registry.register_env(
"my_wrapper", create_env,
)
Ray/rllib appears to call my create_env() function more than once and errors out because the rollout key was popped.
@tim_whitaker Copy the config instead of popping the value directly, python is implicitly is having a “pass by reference” on the config.
This kind of points out a sort of loophole though, the env config can be modified from a wrapper (for example setting num_levels or changing other keys), which should be against competition rules. I think the organizers should explicitly mention this in the rules, else we could probably use “paint_vel_info” like the last round, though I don’t want to waste a submission to try it.
Hello @dipam_chakraborty
Thanks for sharing your thoughts. This is the reason why we want the participants to use ProcgenEnvWrapper
. The evaluation env_config shared in the other post will be forced at the base env used in your wrapper. So, it is not possible to override the config that we set using a wrapper.
Hi @jyotish
Could you please answer my above question?
If we use frameskip, does the framework count the number of frames in the right way?
For example, if we useframe_skip=2
, the number of interactions between the agent and environment is8e6/2=4e6
when using only 8M frames. If we use the standard configuration which settimesteps_total=8000000
, will this stop correctly?
It probably will not. Rllib counts the steps based on the counter in each worker, which operates on the outmost wrapper of the environment. It’s also pretty easy to go over this number if implementing a custom trainer, so one must take care to not go over it.