Min and Max rewards for an environment

jyotish · August 31, 2020, 5:06pm

Hello!

The min and max rewards for an environment are available as a part of env_config. For example

from ray.tune import registry
import envs.procgen_env_wrapper import ProcgenEnvWrapper

def my_custom_env(env_config):
    return_min = env_config["return_min"]
    return_blind = env_config["return_blind"]
    return_max = env_config["return_max"]
    env = ProcgenEnvWrapper(env_config)
    return env

registry.register_env(
    "my_custom_env",  # This should be different from procgen_env_wrapper
    my_custom_env,
)

Alternatively, these values are also accessible via the ProcgenEnvWrapper. For example,

from ray.tune import registry
import envs.procgen_env_wrapper import ProcgenEnvWrapper

def my_custom_env(env_config):
    env = ProcgenEnvWrapper(env_config)
    return_min = env.return_min
    return_blind = env.return_blind
    return_max = env.return_max
    return env

registry.register_env(
    "my_custom_env",  # This should be different from procgen_env_wrapper
    my_custom_env,
)

return_min is an approximate lower bound on the value function (minimum possible reward)
return_blind is the reward obtained by an agent with no access to the observations
return_max is the maximum possible reward for the environment

dipam_chakraborty · August 31, 2020, 8:34pm

So do we need to use these attributes provided or can we program our own? For example Distributional RL we need the actual minimum instead of the “blind observation training” minimum given.

jyotish · August 31, 2020, 9:29pm

Hello @dipam_chakraborty

Thanks for pointing this out. We made a few changes and now provide min, blind and max reward values. We are also using “return” instead of “reward”, so that these values aren’t confused with single step rewards.

asuka · September 5, 2020, 9:44am

So it means when it runs locally, we could specified these values in the *.yaml file, and the values will be provided (overwritten) during submission?