Hello!
The min and max rewards for an environment are available as a part of env_config
. For example
from ray.tune import registry
import envs.procgen_env_wrapper import ProcgenEnvWrapper
def my_custom_env(env_config):
return_min = env_config["return_min"]
return_blind = env_config["return_blind"]
return_max = env_config["return_max"]
env = ProcgenEnvWrapper(env_config)
return env
registry.register_env(
"my_custom_env", # This should be different from procgen_env_wrapper
my_custom_env,
)
Alternatively, these values are also accessible via the ProcgenEnvWrapper
. For example,
from ray.tune import registry
import envs.procgen_env_wrapper import ProcgenEnvWrapper
def my_custom_env(env_config):
env = ProcgenEnvWrapper(env_config)
return_min = env.return_min
return_blind = env.return_blind
return_max = env.return_max
return env
registry.register_env(
"my_custom_env", # This should be different from procgen_env_wrapper
my_custom_env,
)
-
return_min
is an approximate lower bound on the value function (minimum possible reward) -
return_blind
is the reward obtained by an agent with no access to the observations -
return_max
is the maximum possible reward for the environment