I have received some questions from the participants via mail that I am now reposting here for everyone.
Round 1 is closing, but these clarifications will apply to Round 2 as well.
While discussing these, we have also decided to make an explicit permission about cropping the observation image;
it is allowed to crop the observation image, in the following manner:
cropped_observation = observation['retina'][0:180,70:250,:]
See questions below for the rationale.
Can we use computer vision techniques (such as using OpenCV) to generate some internal rewards such as the distance between the gripper and the cube in the intrinsic phase or this is not allowed?
No, it is not allowed to make a reward explicitly tied to the distance between the gripper and the cube, because it would be a way to give an information about the extrinsic task.
The robot does not know that the cube (or moving the cube) will be a “goal” later.
On the other hand, it is allowed to build intrinsic rewards that are general and would apply to any task.
For example, Pathak’s Curiosity-driven Exploration by Self-supervised Prediction gives rewards for unpredicted events; we would expect such an algorithm to give a reward to the robot the first time it hits (and moves) the cube as it would be an unpredicted event, and to keep giving rewards whenever it moves the cube in new, unexpected directions.
In general, it is difficult to create a reward that moves the robot immediately to the cube without giving it “forbidden” information - it has to hit the cube by chance at least once before intrinsic rewards can help it to reach it again and more often.
Is it possible to use OpenCV to locate the red cube?
No, using OpenCV to directly locate the red cube is forbidden.
However, using OpenCV is not forbidden per se, as long as one does not put specific knowledge about the task in it (like: locate the red cube because we need to touch it).
As an example, the baseline does use OpenCV. We used it to subtract the background of the collected images so that the VAE only processed the images of what had changed before and after the action was performed.
Using a VAE also in a way locates the red cube, since the latent space usually correlates with the x,y position of the cube - however, this is allowed because it does so in a general manner (i.e. the VAE would work also for other objects and even if the task would be different, such as rotating the cube or putting the arm in a specific position).
Is it okay to crop the observation image?
It is difficult to crop the image in a “general way” without introducing some knowledge about the task - since the temptation would be to just zoom on the table since the real action is there. To avoid breaking the rules, one would have to invent something clever so that the robot itself learns where it should focus its attention and then do the crop by itself.
On the other hand, the provided 320x240 image observation is mostly blank. We used a really big field of view, which goes well beyond the table (especially at the bottom of the image where the arm would rarely go).
So, if it is needed for performance reasons we allow as a special exception to crop the image to 180x180 by doing this:
image = observation['retina'][0:180,70:250,:]
This crops all the extra “white” on sides and at the bottom. It also crops some information about the arm since the arm can easily go beyond the table.
Notice that resizing the whole image (or the 180x180 crop) is always allowed since it does alter all the image at once without bias.
Is it possible to reset the environment?
It is not allowed to reset the environment - however the environment will reset automatically the position of the objects whenever they fall off the table.