Internal Reward Dependent on Expert Data and State

Hi,

Can we have an internal reward that depends on the expert data and the state - or does this count as hard coding?

Eg. a reward based on similar images in the dataset.

Thanks

As long as the internal reward is learned from the data, this is allowed. This is not allowed if it is directly a function of the state and external data.