Human score

So I was a little bored and decided to see how well I could play the procgen games myself.

Setup:

python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun

First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:

Environment Mean reward Mean normalized reward
bigfish 29.40 0.728
bossfight 10.15 0.772
caveflyer 11.69 0.964
chaser 11.23 0.859
climber 12.34 0.975
coinrun 9.80 0.960
dodgeball 18.36 0.963
fruitbot 25.15 0.786
heist 10.00 1.000
jumper 9.20 0.911
leaper 9.90 0.988
maze 10.00 1.000
miner 12.27 0.937
ninja 8.60 0.785
plunder 29.46 0.979
starpilot 33.15 0.498

The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didn’t improve much while playing.

I’m not sure how useful this result would be as a “human benchmark” though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.

10 Likes