Descriptions of the reward function

ghabs · October 25, 2019, 7:49pm

Where can I find details on the reward function? For example I see on the leaderboard that the current leading submission has the highest fraction of done agents - how much does that matter for the mean reward vs. time until done?
And for the normalized reward, is the scale -100 to 0?

mlerik · October 29, 2019, 6:51pm

Hi @ghabs

Sorry for the very late response. The information can be founde in the document here. This however is not very easy to find and understand thus in simple words, scores are currently computed as follows:

Mean number of agents done, in other words how many agents reached their target in time.
Mean reward is just the mean of the cummulated reward.
If multiple participants have the same number of done agents we compute a “nomralized” reward as followes:

normalized_reward =cumulative_reward / (self.env._max_episode_steps +self.env.get_num_agents()

We are currently updating the FAQ section so the information will be available there soon.

Best regards,

The Flatland Team

ryznefil · October 31, 2019, 2:29pm

Hi @mlerik,

do I understand it correctly that the order in which you listed is determined by the priority of these criteria?

E.g. If somebody completes 100% of agents and no one else manages to do that, then he is the winner, is this true?

Thank you.

Best Regards,

Filip

mlerik · October 31, 2019, 7:26pm

Hi @ryznefil

Yes this is correct. Given the time limit of the episodes it is hard to finish with all agents in time. Thus if you are able to manage traffic as good as possible by finishing the most agents your solution is considered better.

As in round 1 we expet many participants to achieve similar or the same score. Therefor we rank the same scores according to how fast the agents reach their target by looking at the normalized reward.

Hope this helps.

Best regards,

Erik