🧞 Pain points in Round 1 and wishes for Round 2?

With 7 days to go in Round 1, what have been the major pain points so far? What would you want to see improved in Round 2?

Edit: fill up the survey to help us understand what we can improve!

4 Likes

Hi!
Thank you for this competition!
Major pain was too little time for submission (even for debug one).
For example, if we take neurips2020-flatland-starter-kit and change CustomObservationBuilder to TreeObsForRailEnv with max_depth=2, we will get the following results for the biggest test (Test_13):
Mean of Time per Step : 0.5s
Number of Steps : 2448
This is about 20 minutes for the one test (we have 2 of them). That is 40 minutes + the rest 26 tests (some of them are big enough), while 48 minutes timeout in debug submission.
I have seen the related discussion, but suggested solution

If you perform steps with no actions for the whole episode (ie env.step({}) ), you will very quickly reach the end of that episode.

will lead to the same results, because _get_observations method of rail_env is executed independently from actions.

It will be great, if we have much overall time for submission!

5 Likes

Thanks for the thread for dicussion.

As a participant who really interested in usng RL to solve this problem, my concerns are:

  • Timing. When we use RL, likely we need to use GPU for inference. Unfortuntately, our GPU utilization should be low as it only serve one or a few states per batch. So I may expect that for larger grid size, RL with GPU is likely to be less efficient than OR method.
  • Diversity of env. When we have 14 different size of grid, it makes our RL training harder. If we further consider different speeds, it may require more effort for deadlock free planning.

My wishes for Round 2 are:

  • Use only a few large test cases(for example, # of test cases <= 10), while keep same overall running time. It may be even better to test with same grid size.
  • Use same speed for different agents. I personally prefer to focus more on RL related things, instead of dealing with dead-lock from different speeds.

I think one of OR’s shortage is that it’s not straightforward to optimize for global reward.
My understanding: RL’s advantage is finding a better solution(combining with OR), but not acting in a shorter time.
If we want to see RL performan better than OR, we should give RL enough time for planning/inference on large grid env. (both 5 min and 5s may not be enough for RL to do planning and inference. )

2 Likes

About this:

The trick is to use a dummy observation builder, which takes no time, and to build the observations by calling the actual observation builder yourself when needed by calling observation_builder.get_many()

1 Like

Thank you @akleban and @junjie_li for your answers!

Regarding the 8 hour time limit, would it solve the issue if this time limit would not cancel the submission when it takes too long, but would instead give a score of -1.0 to all the environments that have not been solved in time?

Did you have problems with the 5 min and 5 seconds time limits? What do you think would be reasonable time limits to use instead?

@junjie_li I understand that these two points are making things harder:

  • large variety of environments
  • potentially different train speeds in Round 2

However, these are part of the business problem SBB and DeutscheBahn are facing and that we are trying to solve. We need to strike a balance between making the challenge feasible/interesting, and keeping it close enough to the real-world problem so results are useful!

2 Likes

Regarding the 8 hour time limit, would it solve the issue if this time limit would not cancel the submission when it takes too long, but would instead give a score of -1.0 to all the environments that have not been solved in time?

Yes, it is a good solution in case that simulations will be sorted ascending size of environment.

1 Like