I have a question regarding the (possible) reward functions. Currently, the calculation of the reward for an agent is hard-coded into the step() function of the RailEnvironment.
In particular values for rewards and penalties for particular events
My suggestion would be to turn this local variables into class variables to that we can try different reward functions more easily.
Best
Marc-Oliver
Update:
I see that in Flatland 2.0 the reward values are class variables. I am just wondering if the reward calculation should be part of the environment at all?
Logically, the training code should compute the rewards, based on feedback from the environment.
Thank you for your input. The internat reward is mostly used for submission scoring. We encourage participants to shape their own reward to improve the behavior of their agents.
Do you have any suggestions how this could be simplified for participants? Are there variables / metrics the environment should provide in order to facilitat custom reward functions?