🚑 Addressing Round 1 pain points

Thanks you everyone for your feedback on Round 1! Here’s a summary of the problems encountered so far, and how we plan to address them.

TL:DR: Round 2 will be similar to Round 1 but with many more environments. The 8 hours overall time limit won’t cause submissions to fail anymore. Prizes will be announced soon. Reported bugs are being fixed. Round 2 is pushed back by one week while we address all the feedback.

The 8 hours overall time limits is too strict! :timer_clock:
This is the most common problem: it’s very hard to get an RL solution to finish in time.

To fix this, we will make this time limit a “soft timeout”: if your submission takes more than 8 hours, it won’t be cancelled anymore, but instead all the remaining episodes that is didn’t have time to solve will receive a score of -1.0.

To make this process fair, the order of the evaluation environments will be fixed. The environments will also be ordered in increasing order of size.

The environment is too slow :snail:
The Flatland environment does get slow when running larger environments!

This is a problem in two situations. First, for submissions: in this case it could push solutions over the 8 hours overall time limit. Now that this time limit will be “soft”, this won’t be such a big problem anymore. Yes, the environment will still take a large chunk of the time during the evaluation process. But your submission will be valid even if it takes too long, and the environment takes the same amount of time for all participants, so things are fair.

Still, the speed of the environment limits how fast you can train new agents and experiment with new ideas. We will release a new version that includes a number of performance improvements to alleviate this issue for Round 2.

I don’t want people to see videos of my submissions :see_no_evil:
Some participants have expressed the wish to hide their submissions videos.

This is not something we plan to provide. Our goal is to foster open and transparent competition, and showing videos is part of the game: participants can glean some information from them to get new ideas.

One strategy would be to wait for the last minute to “hide your hand”. This is possible, but can be risky, as the number of submissions per day is limited, so it is generally better to secure a good position on the leaderboard as soon as possible!

We still don’t know what the prizes will be! :gift:
The original prizes were travel grants to NeurIPS - but sadly the conference will be fully virtual this year.

This forced us to look again for new sponsors for the prizes. While we can’t announce anything yet, things are progressing, and we’re hoping to announce exciting prizes by the time Round 2 starts.

The margin of progression for OR is too small 💇
OR solutions reached 100% of completion rate in a matter of days in Round 1, and are now fighting over thousandth of points. Since the overall time limit is now “soft”, we will simply add many more evaluation episodes including much larger environments to allow a larger margin of progression for all solutions.

Documentation is still lacking :books:
Flatland is a complex project that has been developed by dozens of people over the last few years. We have invested a lot of energy to gather all the relevant information at flatland.aicrowd.com, but we realise there is still a lot of work ahead.

We will keep working on this, but this is a large task where your contribution is more than welcome. Contributing to the documentation would make you an official Flatland Contributor! Check out https://flatland.aicrowd.com/misc/contributing.html to see how you can help.

Various bugs are making our lives harder :bug:
Here’s a list of known bugs we plan to squash before Round 2 starts:

  • Debug submissions count the same a full submissions :scream:

  • When a submission is done, the percentages and other metrics reported in the Gitlab issues are non-sensical (“-11.36% of agents done")

  • Rendering bug showing agents in places where there shouldn’t be

We’re hard at work to address all these issues. We have moved the starting date of Round 2 one week back to give us time to implement and deploy all the necessary changes.

We’re still open to comments, complaints and requests! Please fill up the survey if you haven’t done so:

5 Likes

I may be wrong, but below is my feedback about adding many more evaluation episodes:

  • Currently RL’s complete rate is row even given current env settings. It may narrow the application of RL in order to compete with OR method.

  • It may ask us to focus more on OR method.

As I commented before, I think larger env is good, but it’s better to have much less test cases.

Hey @junjie_li,

Quoting from your post here 🧞 Pain points in Round 1 and wishes for Round 2? :

My wishes for Round 2 are:

  • Use only a few large test cases(for example, # of test cases <= 10), while keep same overall running time. It may be even better to test with same grid size.
  • Use same speed for different agents. I personally prefer to focus more on RL related things, instead of dealing with dead-lock from different speeds.

I think one of OR’s shortage is that it’s not straightforward to optimize for global reward.
My understanding: RL’s advantage is finding a better solution(combining with OR), but not acting in a shorter time.
If we want to see RL performan better than OR, we should give RL enough time for planning/inference on large grid env. (both 5 min and 5s may not be enough for RL to do planning and inference. )

I think I understand your point of view. Indeed, by focusing on a few large environments with RL, the global reward could be better than using OR, as RL can explicitly optimize the reward.
Did I understand your point correctly?

However, the business problem is different. In the real world, OR methods are already very good at finding optimal solutions. The problem is that they take too long to calculate these solutions, and the calculation time explodes with the size of the railway network. This is especially a problem when a train breaks down: people are waiting, so a solution should really be found as fast as possible, even if it’s not completely optimal.

This is why we are introducing this “time vs score” trade-off: in practice it may be more useful to have a sub-optimal solution that allows the trains to start moving after a few minutes of calculations, rather than having to wait an hour before finding a perfect solution. Similarly in Round 2 your solution can be faster but come up with solutions which are not perfect, but still potentially accumulate more points.

We are hoping that RL can help move the needle here, as the agents could potentially keep moving without having to calculate a full planning until the end, therefore finding an approximate solution faster!

2 Likes

Hi, @MasterScrat, thanks for the kind reply and explaination.

As there is no other teams(using RL) sharing simialr concerns with me, please moving forward.

1 Like