Human score

karolisram · September 22, 2020, 11:02am

So I was a little bored and decided to see how well I could play the procgen games myself.

Setup:

python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun

First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:

Environment	Mean reward	Mean normalized reward
bigfish	29.40	0.728
bossfight	10.15	0.772
caveflyer	11.69	0.964
chaser	11.23	0.859
climber	12.34	0.975
coinrun	9.80	0.960
dodgeball	18.36	0.963
fruitbot	25.15	0.786
heist	10.00	1.000
jumper	9.20	0.911
leaper	9.90	0.988
maze	10.00	1.000
miner	12.27	0.937
ninja	8.60	0.785
plunder	29.46	0.979
starpilot	33.15	0.498

The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didn’t improve much while playing.

I’m not sure how useful this result would be as a “human benchmark” though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.