Architectures of Round 1 winners

I’m very interested in how domain-general the architectures in Round 1 were (or if they involved techniques like reward shaping, and if so how much). This will be helpful in order to interpret the significance of their achievment.

Will the architectures be released at some point? And if not, would be it possible to find out, in broad strokes, the extent to which the winner used hardcoded knowledge?

I’d say a lot of results achieved a performance of mean floor 5-6 that used the dopamine Rainbow DQN tutorial.

Some of Round 1 tricks were about limiting the agent’s action space to 6-8 actions.


Apart from using improved Dopamine Rainbow, here is a short list of nuggets of information other people have shared: