Getting Rmax from environment

jurgisp · July 28, 2020, 11:33am

Is it possible/allowed for the agent to get Rmax setting of the current environment? This is used, for example, in procgen baselines paper (Appendix D.2) for distributional min/max values.

jyotish · July 28, 2020, 11:43am

Hello @jurgisp

I do not think there is a way to get the max reward value from the gym environments. [refer]

The Rmin and Rmax for the publicly available environments are available at

github.com

openai/train-procgen/blob/master/train_procgen/constants.py#L39-L56


EASY_GAME_RANGES = {
    'coinrun': [5, 10],
    'starpilot': [2.5, 64],
    'caveflyer': [3.5, 12],
    'dodgeball': [1.5, 19],
    'fruitbot': [-1.5, 32.4],
    'chaser': [.5, 13],
    'miner': [1.5, 13],
    'jumper': [1, 10],
    'leaper': [1.5, 10],
    'maze': [5, 10],
    'bigfish': [1, 40],
    'heist': [3.5, 10],
    'climber': [2, 12.6],
    'plunder': [4.5, 30],
    'ninja': [3.5, 10],
    'bossfight': [.5, 13],
}

For the private env used for round 1 (caterpillar), the min and max rewards are [R_min, R_max] = [8.25, 24] .

Please note that Rmin is the score for an agent that has no access to the observations and not the minimum possible score.

tim_whitaker · August 25, 2020, 7:43pm

@jyotish Curious about how to reconcile this post with new announcement of no environment specific logic. Can we make use of knowing the max rewards and min rewards of each specific environment?

kcobbe · August 27, 2020, 4:34pm

Yes, using the environment name to determine min/max rewards is perfectly fine.