Dear michalvavrecka,
the environment should be used “as it is” to stay within the spirit of the rules.
However, given the difficulty of the challenge, we allow some exceptions for Round 1.
As long as the Golden Rule is not violated (see Rules), all submissions will be considered valid and ranked for Round 1.
However, only submissions fully complying with the spirit of the rules will access Round 2 and take part to the final ranking.
It is possible to submit multiple submissions on Round 1.
So, for your questions:
No resets allowed. The environment should be kept as it is. The only “reset” available is that objects which go out of bounds are automatically placed back on the table.
In theory, the agent should learn within 10M steps. While this is not checked by the automatic evaluation in Round 1, an agent which needs 1000M steps to learn something, it will probably fail Round 2.
Yes.
There are no episodes (at least from an external point of view - the agent can split the 10M step experience as it wishes).
No, except for debugging purposes (i.e. not when submitting).
No. But I would like to mention that the real_robots package contains a second environment with only the cube in it as a way to simplify things while debugging your algorithm.
Use env = gym.make('REALRobotSingleObj-v0') to create it.
Given the above, it is not possible to make a curriculum by progressively increasing the difficulty of the environment.
It is possible of course for the agent to build the curriculum by itself by focusing on some aspects first.
i.e. it would be fine if the agent concentrates on controlling its arm first.
In general, no. This is because one of the difficulties of open-ended autonomous learning is dealing with large action spaces which often contain only a very “tiny subspace” of useful action for the task at hand.
So when you restrict the action space you give sidestep a part of this challenge and also indirectly give away some information about the task which the agent is not supposed to have.
However, one thing we noticed is that the environment is hard to predict when the arm moves at full speed, since very little differences in the starting position result in big difference in the outcome of collisions with objects.
Due to this we may allow for restrictions on the speed of robot on both Round 1 and 2 - but not on restriction on the range of movements (i.e. restricting the joints so that the arm is always over the table).
You may still restrict action space to make a Round 1 only submission (as explained above).
So, to be clear, you mean that in order to try to be top 10 of Round 1 anything that doesn’t violate Golden Rule will be allowed. A submission not complying the spirit of the rules, but respecting Golden Rule, can allow participation to Round 2 ** in which the user should now use another algorithm** that is compliant. Is it correct?
Dear mrrobot,
to participate in Round 2 you have to be ranked in the Top 20 (not to top 10) of Round 1 with an algorithm complying with the spirit of the rules.
You can have multiple submissions, some complying and some not complying with the spirit of the rules. Only those complying are valid to enter Top 20.
Yes, I think the current time limit is set to 12 hours for the extrinsic phase (Round 1), 60 hours for the intrinsic + extrinsic phase combined (Round 2).
You can wait intermediately between each step to do processing.
Running the environment step alone takes a substantial amount of time, so it is a very good idea to do parallel processing and train a neural network (or doing other computations) meanwhile.
We currently launch on a 4 CPU with 26Gb RAM and a K80 GPU machine (Google’s n1-standard-4 machine + GPU).
We plan to up that to 8 CPU, 30GB, V100 GPU machine for Round 2 (Google’s n1-standard-8 machine + GPU).
(We might increase to that already for Round 1 but I’d have to check).
Can we compute end-effector position from joint angles as part of the state-representation?
Also, is it ok to sample goals from representation space (“learned” with CV for Round 1 and fully unsupervisedly learned for Round 2? ) I’m aking because world explicitly is not completely clear for me in this quote
during the intrinsic phase the robot is not explicitly given any task to learn and it does not know of the future extrinsic tasks,
Can we compute end-effector position from joint angles as part of the state-representation?
Well, if you only use joint-angles and knowledge of the robot structure, you are not giving it any information about the environment (in the sense of the environment outside the robot itself), nor you are giving it information about the task, so I think this can be allowed.
(A stricter interpretation might include the robot itself as part of the environment and thus part of the task but…that would be harsh :))
Also, is it ok to sample goals from representation space (“learned” with CV for Round 1 and fully unsupervisedly learned for Round 2? ) I’m aking because world explicitly is not completely clear for me in this quote
during the intrinsic phase the robot is not explicitly given any task to learn and it does not know of the future extrinsic tasks,
Yes, it is ok.
Indeed one of the articles we suggested (Visual Reinforcement Learning with Imagined Goals) samples goals from the learned representation space.
I think you can read that quote without the word explicitly.