I have a question regarding the (possible) reward functions. Currently, the calculation of the reward for an agent is hard-coded into the step() function of the RailEnvironment.
In particular values for rewards and penalties for particular events
My suggestion would be to turn this local variables into class variables to that we can try different reward functions more easily.
I see that in Flatland 2.0 the reward values are class variables. I am just wondering if the reward calculation should be part of the environment at all?
Logically, the training code should compute the rewards, based on feedback from the environment.
Thank you for your input. The internat reward is mostly used for submission scoring. We encourage participants to shape their own reward to improve the behavior of their agents.
Do you have any suggestions how this could be simplified for participants? Are there variables / metrics the environment should provide in order to facilitat custom reward functions?
I think the environment should return a dict with all (reward) relevant information, such as
- agent tried to perform an illeagal move
- agent tried to move to an occupied cell
- agent reached goal
I think it would also be useful to provide some of the checks in env.step() as functions. for example
def is_valid_move(agent, action) --> bool
Thank you @marcoliver_gewaltig
This are great inputs. I opened an issue here. Will get to this as soon as we have some capacity.
The Flatland Team