Train a metacontroller based on Expert Data which is divided into several stages manually by reward

shadowyzy · October 19, 2019, 7:07am

Can I train a metacontroller based on Expert Data which is divided into several stages manually by reward, or does this count as hard coding?

Eg. stage 1 is the time when it has no reward .

kaixin · October 19, 2019, 12:52pm

I guess it is not allowed. The reward is closely related to the inventory. So using reward to split stages is equivalent to using inventory observation.