The top three solutions to last year’s challenge obtained very good results, is there still a significant room for improvement?
– Wello on Discord
First, if you want to check out the top solutions from last year, they are available here:
The difference from last year is that the agents now need to act within strict time limits:
- agents have up to 5 minutes to perform initial planning (ie before performing any action)
- agents have up to 5 seconds to act per timestep (5 seconds in total for all the agents)
This comes from a real-life problem: if a train breaks down somewhere in the railway network, you need to re-schedule all the other trains as fast as possible to minimize delays.
Last year, most solutions used operations research approaches. These methods are very good at finding optimal train schedules, but the problem is that the don’t scale well to large environments: they quickly take too long to run.
This is why we are encouraging people to use reinforcement learning solutions this year, as we believe this will allow faster scheduling. The idea is that in the real world, it would be better to have a fast planning method that would provide an approximate solution, rather than having a method that can provide a perfect planning but which will take hours to calculate it.
TL;DR: This year, we added more aggressive time limits to make the problem more realistic. This will give an edge to RL solutions.