Thanks for the ping! In general the rule is that any reward shaping cannot be dependent of the agents state. Scaling, including zeroing rewards is fine as long as this is not used to somehow take advantage of the agent’s state to shape the rewards.
@weel2019 How are you intending to supervise learning? We intended for learning features of the data from human demonstration to be in integral part of the competition! However we don’t allow external datasets to be included in the learning procedure, so for example hand annotating 1000 frames with new labels is not allowed. Furthermore in round 2 we modify Minecraft with a new texture-pack so any annotations would not be valid or carry over
Note that even when parameters are learned, they can still count as hard-coded if they are being used to fix a policy or meta-controller. For example, if you use summaries of the dataset to encode constants, e.g. max(num_diamonds * c) = c this would still count as hard-coding.