With 7 days to go in Round 1, what have been the major pain points so far? What would you want to see improved in Round 2?
Edit: fill up the survey to help us understand what we can improve!
With 7 days to go in Round 1, what have been the major pain points so far? What would you want to see improved in Round 2?
Edit: fill up the survey to help us understand what we can improve!
Hi!
Thank you for this competition!
Major pain was too little time for submission (even for debug one).
For example, if we take neurips2020-flatland-starter-kit and change CustomObservationBuilder
to TreeObsForRailEnv
with max_depth=2
, we will get the following results for the biggest test (Test_13):
Mean of Time per Step : 0.5s
Number of Steps : 2448
This is about 20 minutes for the one test (we have 2 of them). That is 40 minutes + the rest 26 tests (some of them are big enough), while 48 minutes timeout in debug submission.
I have seen the related discussion, but suggested solution
If you perform steps with no actions for the whole episode (ie
env.step({})
), you will very quickly reach the end of that episode.
will lead to the same results, because _get_observations
method of rail_env
is executed independently from actions.
It will be great, if we have much overall time for submission!
Thanks for the thread for dicussion.
As a participant who really interested in usng RL to solve this problem, my concerns are:
My wishes for Round 2 are:
I think one of ORβs shortage is that itβs not straightforward to optimize for global reward.
My understanding: RLβs advantage is finding a better solution(combining with OR), but not acting in a shorter time.
If we want to see RL performan better than OR, we should give RL enough time for planning/inference on large grid env. (both 5 min and 5s may not be enough for RL to do planning and inference. )
About this:
The trick is to use a dummy observation builder, which takes no time, and to build the observations by calling the actual observation builder yourself when needed by calling observation_builder.get_many()
Thank you @akleban and @junjie_li for your answers!
Regarding the 8 hour time limit, would it solve the issue if this time limit would not cancel the submission when it takes too long, but would instead give a score of -1.0
to all the environments that have not been solved in time?
Did you have problems with the 5 min and 5 seconds time limits? What do you think would be reasonable time limits to use instead?
@junjie_li I understand that these two points are making things harder:
However, these are part of the business problem SBB and DeutscheBahn are facing and that we are trying to solve. We need to strike a balance between making the challenge feasible/interesting, and keeping it close enough to the real-world problem so results are useful!
Regarding the 8 hour time limit, would it solve the issue if this time limit would not cancel the submission when it takes too long, but would instead give a score of
-1.0
to all the environments that have not been solved in time?
Yes, it is a good solution in case that simulations will be sorted ascending size of environment.