Quoting W. Guss from Discord on a very similar question:
We want to make it ILLEGAL for participants to make solutions like this that USE game knowledge.
Beyond stacking actions, you could imagine people using “movement detectors” to reverse engineer the action obfuscation, and then hard-code sequences/options (similar to scripted crafting).
Even with a rule against “hard coding” like above, I could just give these extracted action sequences to a policy as options for it to execute, but crucially I am using knowledge about minecraft to produce those options, rather than learning them from data with a “blind eye” to those options existing based on the rules
I think data augmentation in the style of CURL or image flipping etc, doesn’t encode game knowledge, but still it’s hard to write down a rule like this precisely
If your solution wouldn’t work on other domains that you knew nothing about (minus hyper parameters), you’re probably doing something wrong. e.g. let’s say we randomly trained your solution on an RTS like Starcraft, if it’s not general enough to work at all in that setting, that’s bad and we don’t want people to develop solutions which overfit to minecraft .
So if you “extract” options by knowing that there is a fixed sequence of items in the game or whatever, that’s overfitting to minecraft and not the intention of the competition .
For all intents and purposes, the environment should just be a black-box to your algorithm with Pixel + 64 dimensional vector observations and 64 dimensional acton vector
Thats not to say, you can’t do techniques like “automatic option extraction”, it’s just that those technqieus should be generic
So if that means e.g. training an auto-encoder on all of the (state,action) pairs and then doing $k-means$ to get clusters of actions, and then training a neural network for each option. That’s good!