Internal Reward Dependent on Expert Data and State


Can we have an internal reward that depends on the expert data and the state - or does this count as hard coding?

Eg. a reward based on similar images in the dataset.


As long as the internal reward is learned from the data, this is allowed. This is not allowed if it is directly a function of the state and external data.