I am unclear on the rules for training. Can I use all of the provided environments and provided demonstration data for training? Also, I believe the rules state that we are not allowed to specify an agent’s action policy by-hand, but how about an ‘environment training policy’ - something like train x number of times in Navigate, y number of times in Treechop, etc… ?
You can use any environment you would like for training! For testing your agent will only be evaluate on MineRLObtainDiamond-v0, and you are welcome to chose the number of training examples by hand!