How is this challenge different from last year?

The top three solutions to last year’s challenge obtained very good results, is there still a significant room for improvement?
Wello on Discord

Good question!

First, if you want to check out the top solutions from last year, they are available here:

The difference from last year is that the agents now need to act within strict time limits:

  • agents have up to 5 minutes to perform initial planning (ie before performing any action)
  • agents have up to 5 seconds to act per timestep (5 seconds in total for all the agents)

This comes from a real-life problem: if a train breaks down somewhere in the railway network, you need to re-schedule all the other trains as fast as possible to minimize delays.

Last year, most solutions used operations research approaches. These methods are very good at finding optimal train schedules, but the problem is that the don’t scale well to large environments: they quickly take too long to run.

This is why we are encouraging people to use reinforcement learning solutions this year, as we believe this will allow faster scheduling. The idea is that in the real world, it would be better to have a fast planning method that would provide an approximate solution, rather than having a method that can provide a perfect planning but which will take hours to calculate it.

TL;DR: This year, we added more aggressive time limits to make the problem more realistic. This will give an edge to RL solutions.


Those constraints give plentiful time to precompute the routes and compete with rule-based agents. Also 5 seconds is generally more than enough to find perfect path, maybe not with python though. Don’t think you should adjust constraints based on this fact though.

The issue might be that participants can’t find a stronger reinforcement algorithm that beats rule-based agents. Also the rewards favoring RL might be enough to encourage the switch.

Indeed 5 minutes should be enough to pre-compute a perfect path in most cases (although… don’t underestimate how large the test environments might get…)

But then trains will hit malfunctions, forcing you to recompute the routes. 5 seconds will make it harder to re-compute everything!

Finally, the timing constraints as well as the environment sizes may be adjusted from round to round. So you should design your solution taking into account that time per timestep will be scarce, and environments will be huge.

Are there any maximum size limits for the map?

For Round 1 from the FAQ:

  • (x_dim, y_dim) <= (150, 150)
  • n_agents <= 400
  • malfunction_rate <= 1/50

These parameters are subject to change during the challenge.