are there any restrictions for the environment setting during the training stage, namely:
- Should we reset environment anytime during training? Is there a minimum and maximum number of steps in one episode?
- Is there a maximum amount of episodes for the (pretraining) training in Round 1?
- Is the maximum number of steps in Round 2 (intrinsic stage) set to 10M?
- Is the maximum number of episodes in Round 2 ?
- Can we freely initialize objects in nonrandom way during training?
- Can we present environment without objects (only arm) in training stage?
- Can we adopt curricullum learning in training stage?
- Is it possible to restrict arm action space in training stage?
Thanks for answers