So I was a little bored and decided to see how well I could play the procgen games myself.
Setup:
python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun
First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:
Environment | Mean reward | Mean normalized reward |
---|---|---|
bigfish | 29.40 | 0.728 |
bossfight | 10.15 | 0.772 |
caveflyer | 11.69 | 0.964 |
chaser | 11.23 | 0.859 |
climber | 12.34 | 0.975 |
coinrun | 9.80 | 0.960 |
dodgeball | 18.36 | 0.963 |
fruitbot | 25.15 | 0.786 |
heist | 10.00 | 1.000 |
jumper | 9.20 | 0.911 |
leaper | 9.90 | 0.988 |
maze | 10.00 | 1.000 |
miner | 12.27 | 0.937 |
ninja | 8.60 | 0.785 |
plunder | 29.46 | 0.979 |
starpilot | 33.15 | 0.498 |
The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didn’t improve much while playing.
I’m not sure how useful this result would be as a “human benchmark” though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.