Hi,
Can we have an internal reward that depends on the expert data and the state - or does this count as hard coding?
Eg. a reward based on similar images in the dataset.
Thanks
Hi,
Can we have an internal reward that depends on the expert data and the state - or does this count as hard coding?
Eg. a reward based on similar images in the dataset.
Thanks
As long as the internal reward is learned from the data, this is allowed. This is not allowed if it is directly a function of the state and external data.