Mutli Agent Setup

mlerik · July 3, 2019, 4:43pm

Question from Panayiotis :

Given that the goal of the Flatland challenge is to develop multi-agent reinforcement learning techniques, it stands to reason to encourage solutions that are decentralized, scalable, and utilize only local information provided to each train. Reading the description, though, it seems that this is not the case. I would argue that solutions which only utilize “tree” or “local” observations should be favoured over solutions which use “global” observation.

mlerik · July 3, 2019, 4:48pm

Response by @mohanty :

Hi Panayiotis,

Thanks for your interest in the library and the competition, and for raising for valid points.

I agree, that the goal is to encourage solutions that are de-centralized, scalable. But I have mixed feellings about leveraging only local information available to the agents, when in almost all the cases, the agents can have access to global information if their is a demonstrated use for the same.

The good point about local obserrvations (small grid window around the agent) is that its clearly scalable, but the downside is that there might not be enough information for agents to do long term planning in larger grids (especially when we havent introduced the ideas of communication etc). The good point about tree based observation is that it makes much easier to train the agents by providing much denser information to the agents, but the downside is, I personally feel that this is pushing us in the direction of handcrafting-features and in turn, also subconsciously biasing the direction of thoughts for people doing research with the library : one of the reasons why I have always been a very vocal internally against using the tree observations. On the other hand, Erik is the person who came up with the first implementation and successful experiments using the same, and is quite a supporter of the tree based observations : which work !!

The global observations are a different story though !!

The only reason I love global observations is that they are -generalizable-, and at the same time (atleast theoretically) provide all the information needed to enable complex emergent coordination behavious among agents.

For smaller grids, ofcourse, having access to global observations makes the problem trivial, and in many cases, with even old school agent based approaches, you might solve the task at hand. But as soon as we start scaling to grid worlds of say order of 10,000x10,000, then IMHO, global observations are the most promising observations to train generalizable agents : because thousands of agents with small local observationos will not have enough information. And the tree based observations will start becoming restrictive expensive to compute ! But if we have a generalisable solution of how to digest the global observation, then that could actually scale.

One idea (Which we havent yet experimented with), was imagining layers of convolutions which convert the global observation into much dense (and lower dimensional) representations which can then eventually be used for your policy explorations, then an approach like that could easily scale to much larger grids !!

Hence, we agreed to let the global observation stay with the hope that there are enough incentives (And information) available to participants who might want to explore generalisable solutions when we increase the grid sizes to much higher values.

RomanChernenko · July 3, 2019, 9:49pm

So, we should expect 10000*10000 world size in next round, right?

mohanty · July 4, 2019, 11:21am

@RomanChernenko: Not yet, but in one of the future versions of the competition, definitely !!!