When reading through the rules, I saw that one cannot pre-process the visual data via edge detection methods. This seems like a very strange rule to me, as it basically excludes biological approaches based on e.g. center-surround receptive fields (which perform edge detection). In the brain, these are basically fixed and do not really learn. AFAIK, learning happens in V1 (sparse coding), but not before that.
It seems strange to exclude biologically realistic methods in such a competition, limiting it only to end-to-end deep learning models. If it were up to me, I would just remove this rule, as I don’t feel like it is “cheating” to simulate the brain.
Quoting W. Guss from Discord on a very similar question:
We want to make it ILLEGAL for participants to make solutions like this that USE game knowledge.
Beyond stacking actions, you could imagine people using “movement detectors” to reverse engineer the action obfuscation, and then hard-code sequences/options (similar to scripted crafting).
Even with a rule against “hard coding” like above, I could just give these extracted action sequences to a policy as options for it to execute, but crucially I am using knowledge about minecraft to produce those options, rather than learning them from data with a “blind eye” to those options existing based on the rules
I think data augmentation in the style of CURL or image flipping etc, doesn’t encode game knowledge, but still it’s hard to write down a rule like this precisely
If your solution wouldn’t work on other domains that you knew nothing about (minus hyper parameters), you’re probably doing something wrong. e.g. let’s say we randomly trained your solution on an RTS like Starcraft, if it’s not general enough to work at all in that setting, that’s bad and we don’t want people to develop solutions which overfit to minecraft .
So if you “extract” options by knowing that there is a fixed sequence of items in the game or whatever, that’s overfitting to minecraft and not the intention of the competition .
For all intents and purposes, the environment should just be a black-box to your algorithm with Pixel + 64 dimensional vector observations and 64 dimensional acton vector
Thats not to say, you can’t do techniques like “automatic option extraction”, it’s just that those technqieus should be generic
So if that means e.g. training an auto-encoder on all of the (state,action) pairs and then doing $k-means$ to get clusters of actions, and then training a neural network for each option. That’s good!
I agree that it’s not an easy line to draw, but I would say edge detection should definitely be allowed.
There is nothing truly “generic” in ML except for random search - everything useful has a prior basically. So going by what I just read, convnets should be banned, as they are not a “blind eye” either - they assume that the image is somewhat “natural” (it has spatial correlations). Edge detection is just one step further than this - it assumes the correlations are “sharp” enough to form edges.
Here is how I would draw the (still fuzzy) line: If it would at least run on other games (not necessarily perform well) with pixel data (e.g. Atari), then it should be allowed (this would excluded the “action hacking”).
@CireNeikual He, you’re a familiar name. Aren’t you the OgmaNeo person? If you’re talking about your ImageEncoder, I think it should comply with the rules.
@anyonic_ai That’s me! I am indeed referring to the ImageEncoder, but also other methods we have been working on. The ImageEncoder has learning, but some other methods do not need learning.