Architectures of Round 1 winners

lagerros · June 1, 2019, 12:00am

I’m very interested in how domain-general the architectures in Round 1 were (or if they involved techniques like reward shaping, and if so how much). This will be helpful in order to interpret the significance of their achievment.

Will the architectures be released at some point? And if not, would be it possible to find out, in broad strokes, the extent to which the winner used hardcoded knowledge?

Leckofunny · June 1, 2019, 8:35am

I’d say a lot of results achieved a performance of mean floor 5-6 that used the dopamine Rainbow DQN tutorial.

Some of Round 1 tricks were about limiting the agent’s action space to 6-8 actions.

anssi · June 1, 2019, 12:42pm

Apart from using improved Dopamine Rainbow, here is a short list of nuggets of information other people have shared:

unixpickle’s blog post on hierarchical RL: https://blog.aqnichol.com/2019/04/03/prierarchy-implicit-hierarchies/
Ross Wightman’s different version of Rainbow (got better results than the Dopamine one): https://github.com/rwightman/obstacle-tower-pytorch-rainbow
Joe Booth got level 10 with “pure model-free RL” on standard-ish consumer hardware: https://twitter.com/iAmVidyaGamer/status/1122325568949633024