I have a question regarding the (possible) reward functions. Currently, the calculation of the reward for an agent is hard-coded into the step() function of the RailEnvironment.
In particular values for rewards and penalties for particular events
My suggestion would be to turn this local variables into class variables to that we can try different reward functions more easily.
I see that in Flatland 2.0 the reward values are class variables. I am just wondering if the reward calculation should be part of the environment at all?
Logically, the training code should compute the rewards, based on feedback from the environment.