Reward Function Design


Currently the reward function calculation is limited by user_agent to 3 different data points. I think it might be a better idea to expose more information to users in this function - with the current framework, it wouldn’t be difficult to create a custom training process that allows a reward function definition with more variables, perhaps giving teams who implement that an advantage?

@mark_haoxiang thanks for your suggestion. What other information would like to be made available for the purpose of calculating custom rewards?

It would be nice if the entire observation space with agent actions could be passed into the function, just to have the option! Eg. A potential reward would be to give a negative reward when the agent conducts an illegal action (discharging when negative battery).

If possible, perhaps some way to construct an unbiased reward that corresponds directly to the metric. This would include information on the baseline (what would the cost be if the agent did nothing last step?)


1 Like