Currently the reward function calculation is limited by user_agent to 3 different data points. I think it might be a better idea to expose more information to users in this function - with the current framework, it wouldn’t be difficult to create a custom training process that allows a reward function definition with more variables, perhaps giving teams who implement that an advantage?
It would be nice if the entire observation space with agent actions could be passed into the function, just to have the option! Eg. A potential reward would be to give a negative reward when the agent conducts an illegal action (discharging when negative battery).
If possible, perhaps some way to construct an unbiased reward that corresponds directly to the metric. This would include information on the baseline (what would the cost be if the agent did nothing last step?)
@kingsley_nweye d o we have any update on that matter ? The UserReward class has a **kwargs argument, does that suggests that we can pass other useful informations ?
Thank you