Step by step: How I setup everything for the Flatland 2020 challenge

  1. register an account at

  2. register an account at (so that you can push/pull repositories from/to it). instead of registering from scratch, they allow you to authorize with your aicrowd account(recommended).

  3. generate a new SSH key pair for if you did this in the past (for GitHub or other stuff), you may use the public key you already generated before. otherwise, generate one with ssh-keygen -t ed25519 on a linux machine.

here’s the output i got from the command:

user@ubuntu:~$ ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/user/.ssh/id_ed25519):[press enter]
Enter passphrase (empty for no passphrase):[leave empty, press enter]
Enter same passphrase again:[press enter]
Your identification has been saved in /home/user/.ssh/id_ed25519.
Your public key has been saved in /home/user/.ssh/
The key fingerprint is:
SHA256:qGBFzFBNDP9s1YcC6RncriUNgRFg+53Ji6DIw3j**** user@ubuntu
The key's randomart image is:
+--[ED25519 256]--+
|  .=**+=o+       |
|   ooo+ =... .   |
|    o .. *o o .  |
|   . . *+++. .   |
|  o . o S+       |

to see your generated public key, enter cat .ssh/ the output should look like ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDHApZr8FPpCfcFlQiW4HXS6TNaplXjDgWd2xNDsxeXV

now goto and paste the content of into the big textbox. press Add Key and you’re done.

In case you don’t understand this step: in public key cryptography, if you generate a private-public key pair(one private, one public, as a pair) and give the public key to a third-party (in this case gitlab.aicrowd(dot)com), then encrypt a message using the private key (which only you have), then the third-party can decrypt your message with your public key and be confident that the message was sent by you and you only (since only you have the private key). This relies on the property that although a public key can be used to decrypt a private key encrypted message, it’s almost impossible to infer the private key from the public key, so you don’t have to worry about your public key being leaked enabling someone to imitate you.

To test whether recognizes you via ssh, enter ssh -T if everything’s good you should see Welcome to GitLab, @Username! in the output.

ssh by default uses the .ssh/id_xxx key pairs, that’s why we don’t have to specify them in the command above.

For more on this please refer to

  1. Clone the Flatland repository to your local machine. git clone

above command will fail if you don’t have the correct key pair on your machine.

I do most of my development on Windows (because i’m not a pure programmer, a lot of my work consists of proprietary software), so i have to copy the key pair files generated within ubuntu onto my windows machine.

simply copy ~/.ssh/id_* to %HOME%/.ssh and you’re good to go.

Side note: Git for windows locate your key pairs via the HOME environment variable(hence the %HOME%), so actual path may vary. To change the value of HOME, edit the value of HOME in Edit Environment Variables.

Side note 2: maybe I shouldn’t use Windows for this project at all. We’ll see.

  1. Install the repository as a python module. because the repository updates itself quite frequently, you should pip install -e flatland where -e stands for editable (instead of copying everything into site-packages) such that a git pull on flatland will update the code immediately without reinstallation.

pip will download and install all the dependencies. use a proxy (set https_proxy=http://blahblah:hehe) if you’re in China. use patience otherwise.

‘all the dependencies’ means a LOT of pypi packages(this is a research grade project). You may want to use virtualenv to avoid polluting your python workspace.

  1. type flatland-demo in your terminal. if everything went well you should see a bunch of trains moving in a grid world.

aside from the aspect ratio and the very low FPS, everything seemed fine.

(to be continued)

  1. run the test suite (python test). mine went well with no errors.

  2. cd examples and python you should see a huge train network animating on your screen.

to stop the script from saving all the frames to your local disk, comment out the line starting with

(To be continued)


I did not participate in the 2019 flatland competition, so everything is pretty new to me. Let’s stay away from the code for a while, and read the rules a little bit.

to summarize:

  • You have a finite size retangular grid world
  • each cell consists of 4 entries(north south west east) and 4 exits(same as entries), an agent (the train) may enter the cell from one of 4 entries(example: enter from south, facing north) and leave from one of 4 exits (example: facing north but leave from east). there are a total of 16 ways of entering then exiting a particular cell.
  • only a few of the 16 moves are possible for each cell, expressed using a 16-bit mask for each cell. so to get from one place to another, the choices are limited.
  • multiple agents are placed inside the grid world, they may move to the next cell (if allowed by the 16-bit mask) or stay still at each timestep.
  • agents have their own goals at certain locations, and their job is to move themselves to their goals. each agent are given -1 reward for each timestep passed if they did not reach their goals.
  • for example, if 5 agents spent 500 steps in the grid world and neither reach their destination, their cumulative reward is -2500. if they all reach their destination in one step, then for the rest of the episode (the documentation didn’t say what would happen after every agent reach their goals, sigh), they will each receive 0 reward for each timestep passed, resulting in a cumulative reward of 0.
  • to normalize across different number of agents / different episode lengths, the scores are further divided by (the total number of steps * the total number of agents). so you get -1 score if you cannot get any of the agents to their destination, or 0 score if you can get all of them to their destination in no time.
  • 2 agents cannot be put into the same cell at any time.
  • agents may fail randomly for a few timesteps before they recover. the optimal strategy when failure is ahead thus depends on the estimated time of recovery(short failure = better wait behind, long failure = better make a detour).
1 Like

as discussed above, per , the rewards are summed from the agents, generating a score.

but per , the rewards are split into local rewards (as discussed above), and global rewards, which is 1 reward per timestep, given to every agent only after all of them reach their destinations.

too much ambiguity. i’m confused.

  1. i wonder how exactly the competition is scored, include global reward or not?
  2. if the competition scoring = sum of agent rewards, then there’s no need to introduce a global reward.
  3. i wonder whether the episode stops after every agent reach their destination.

Thanks for your detailed review :smiley:


  • The agent individually get a local score (at each step: -1 if not at target or 0 if at target) + a global score (at each step: 1 if all agents are at target 0 otherwise)
  • The competition scoring is the sum of agent rewards. So indeed the global reward adds n_agents * 1 to the score, since each agent gets it
  • The episode stops after all the agents have reached their destination. So effectively you only get the global reward once

Generally, we refer to the whole grid world as the grid, and to each position in this grid as a “cell”.

I’ve added that episodes finish when either the max time step is reached or all train have reached their target, good catch!

thank you for making that clear. so the global reward will always be 1 per episode no matter how fast i got the agents to their destinations, as long as i got them to their destinations.

therefore in theory, if 5 agents all get to their destination in no time, the maximum achievable score per should be 0.01(5/500) instead of 0.

I will now change all references of ‘grid’ to ‘cell’.


now let’s get back to code.


  1. clone the submission template repository
git clone
cd neurips2020-flatland-starter-kit

after that they recommend testing the setup locally(run the evaluator on a local machine). this requires an installation of Redis, so I’ll skip this step for now.

not sure why Redis though. HTTP would work just fine.

  1. create an empty repository on

head to and type flatland_submission into project name, then press Create project, to create an empty repository.

  1. add an remote (remote git repository address alias) to the submission template repository that points to the newly created empty repository.
    git remote add aicrowd

  2. push the entire repository onto the newly created remote.
    git push aicrowd master

In other words, we downloaded everything from the template repo and uploaded them into our newly created empty repo.

  1. create a submission to aicrowd, the git way

you may push to your repo a few times but nothing would happen. to notify the aicrowd-bot that our repo is ready for a submission, we have to create a tag and push:

git tag submission-0
git push aicrowd submission-0

the output reads:

>git push aicrowd submission-0
Total 0 (delta 0), reused 0 (delta 0)
 * [new tag]         submission-0 -> submission-0

in git terms, a tag is like a branch, except it’s just a pointer/alias, you can’t really commit to it like a real branch.

now the aicrowd-bot should see the tag and start to treat our new repo as a submission to the competition, evaluating the code in a cloud environment.

per documentation, the tag has to start with submission-. the content within aicrowd.json tells the bot which competition this submission is intended for.

  1. check submission status

Just in case, please make sure you’re already a participant of the competition. go to and press Participate to become a participant (and agree to a bunch of legal terms).

after pushing the submission-0 tag, visit and you should see there’s a new issue:

  1. wait for the cloud to do its job

the evaluation process is slow, so better do it on the cloud than on our own machines.

Before fully diving in, I’d like to share some thoughts on this competition.

Operations Research rocks, to the extent that this year’s organizer has to make specific rules about it. if not OR, then what?

The flatland is a grid world, but not a dense grid - the rails and switches are relatively sparse, just like in real world. also, the physical location of a particular piece of rail/switch doesn’t matter - only the topology matters.

as a general rule, for any learning algorithms to be effective, the input representation must be made compact(easy to consume) and permutation invariant, and in the context of flatland, this means to transform the pixelated cells into connectivity graphs.

instead of training a network that can recursively solve a graph, my idea is that we can do it the human way: use a pathfinding algorithm to generate a bunch of candidate paths for each agent, and let the neural network decide which one to use, just like how humans would use their phone for navigation.

this way, any deviation between the pathfinding candidates and the real-world optimal strategy is left for the neural network to learn and adapt.


now my submission has been graded. they even generated gifs for the evaluation process.

Okay, let’s now interpret the results!

Mean percentage of done-Agents : -97.82%
Mean Reward : 0.02
Mean Normalized Reward : -0.98
  • on average, out of every 100 trains I controlled, 197 of them didn’t reach their destination. makes no sense.
  • If a lot of my trains didn’t reach their destination, I should receive a Mean Normalized Reward close to -1, which makes sense.
  • but then Mean Reward shouldn’t equals to 0.02. makes no sense again.

turns out the data on is correct. so they must made a mistake in the order of variables in the code of their bot.

1 Like

Correct! Well, the trains need to move at least from the starting point to the target, so that’s at least one timestep.

Yes there’s currently a bug in what is displayed in the issue after the evaluation is complete, we’re looking into it! you can get the correct numbers in the issue during training, and then on the leaderboard and individual submission pages.