Reward function and reward information

marcoliver_gewaltig · September 27, 2019, 2:02pm

Hi,

I have a question regarding the (possible) reward functions. Currently, the calculation of the reward for an agent is hard-coded into the step() function of the RailEnvironment.
In particular values for rewards and penalties for particular events
My suggestion would be to turn this local variables into class variables to that we can try different reward functions more easily.

Best
Marc-Oliver

Update:
I see that in Flatland 2.0 the reward values are class variables. I am just wondering if the reward calculation should be part of the environment at all?

Logically, the training code should compute the rewards, based on feedback from the environment.

mlerik · September 28, 2019, 2:46pm

Hi @marcoliver_gewaltig

Thank you for your input. The internat reward is mostly used for submission scoring. We encourage participants to shape their own reward to improve the behavior of their agents.
Do you have any suggestions how this could be simplified for participants? Are there variables / metrics the environment should provide in order to facilitat custom reward functions?

Best regards,
Erik

marcoliver_gewaltig · September 30, 2019, 8:46am

Dear Eric,

I think the environment should return a dict with all (reward) relevant information, such as

agent tried to perform an illeagal move
agent tried to move to an occupied cell
agent reached goal

I think it would also be useful to provide some of the checks in env.step() as functions. for example
def is_valid_move(agent, action) --> bool

Best
Marc-Oliver

mlerik · September 30, 2019, 11:38am

Thank you @marcoliver_gewaltig

This are great inputs. I opened an issue here. Will get to this as soon as we have some capacity.

Best regards,
The Flatland Team