I’m a little bit confused. My mean reward is 8.134, yet my mean normalized reward is 0. I have 3 questions:

  • What are R_min and R_max used to calculate the normalized score for coinrun?
  • How many episodes are used to calculate the evaluation scores?
  • How are these evaluation episodes (levels) different from the episodes given in training?

Something went wrong with the rollouts for two of your submissions. We are looking into it.

We are using these for R_min and R_max values

The rollouts will be run for 1000 episodes with at most 1000 steps per episode.