Mean Reward and Mean Normalized Reward

Hi everyone, I would like to know how the mean reward and the mean normalized rewards are calculated for the evaluations.

You can checkout the information in flatland-rl documentation here.

The scores of your submission are computed as follows:

  1. Mean number of agents done, in other words how many agents reached their target in time.
  2. Mean reward is just the mean of the cummulated reward.
  3. If multiple participants have the same number of done agents we compute a “nomralized” reward as follows: … code-block:

normalized_reward =cumulative_reward / (self.env._max_episode_steps +self.env.get_num_agents()

The mean number of agents done is the primary score value, only when it is tied to we use the “normalized” reward to determine the position on the leaderboard.