đźš‚ Here comes Round 1!

Thank you everyone for your participation and enthusiasm during the Warm-up Round!
We have been very impressed by the quality of the submissions so far, and by the activity around this challenge both on AIcrowd and on other platforms :star_struck:

Here are the changes in Round 1:

  • The 400 evaluation environments will remain the same as during the Warm-up Round. However, the full specifications of these environments are now public: width, height, number of agents, malfunction interval… The only thing we are not disclosing are the seeds. This will make it easier to optimize agents to be as efficient as possible within the evaluation time limit (8 hours).

  • We have made the time limits of 5 seconds per timestep less harsh. Previously, an agent that would take too long to act would cause the whole submission to fail. From now on, only the current episode will be affected: it will receive a score of -1.0 and the evaluation will proceed. The same thing will happen if you go beyond the 5 minutes time limit for initial planning. The overall 8 hours time limit, on the other hand, stays a “hard limit” that will still cause the submission to fully fail.

  • Debug submissions are now limited to 48 minutes. They were previously limited to 8 hours, the same as for full submissions. The idea is that submitting in debug mode will now give you an idea whether your submission would complete a full evaluation in time or not.

Besides these changes, we are happy to release the Flatland RLlib baselines!

:blue_book:Doc: https://flatland.aicrowd.com/research/baselines.html
:card_index:Repo: https://gitlab.aicrowd.com/flatland/neurips2020-flatland-baselines

You will now be able to train agents using advanced methods such as Ape-X and PPO, and using many “tricks” such as action masking and action skipping. We also provide imitation learning baselines such as MARWIL and DQfD, which leverage expert demonstrations generated using last year’s top solutions to train RL agents.

RLlib allows you to scale up training to large machines or even to multiple machines. It also makes it trivial to run hyperparameter search. We are still actively working on these baselines and encourage you to take part in their development! :toolbox::wrench:

9 Likes

Some details about how the new timeouts work:

  • During evaluation, your submission should catch the StopAsyncIteration exception when calling remote_client.env_step(action), in case the step times out. If this exception is raised, you should create a new environment by calling remote_client.env_create() before going further.

  • The submission will still fully fail after 10 consecutive timeouts. This is to prevent submissions from running for 8 hours after the agent has crashed.