For RL to work well, it’s better to have similar configs between the simulation environment of training and evaluation.
To help properly setting up the training environment, can you provide some basic information in the evaluation environment?
For example, the range of the following settings:
- width and height of map
- num of trains
- num of cities
- type of city distribution
- speed ratio of trains
- max rails between cities
- max rails in cities
- type of schedule generator
- malfunction: rate, min/max duration.
The goal of this challenge is to design a policy that is able to generalize to any kind of environment. For this reason, we don’t disclose all the details about the evaluation environments.
However, you can get some details about them:
The environments vary in size and number of agents as well as malfunction parameters.
For Round 1 of the NeurIPS 2020 challenge, the upper limit of these variables for submissions are:
(x_dim, y_dim) <= (150, 150)
n_agents <= 400
malfunction_rate <= 1/50
These parameters are subject to change during the challenge.
This gives you an idea of the distribution of evaluation environments you will have to solve when you do a submission.
Speed profiles are not used in the first round of the NeurIPS 2020 challenge.
So you can just set all the trains to a speed of
Thanks @MasterScrat for the kind reply.
May I know how much difference it may be between round 1 and round 2?
Consider the example with two different settings:
- when we just need our algorithm to work with map size 150 * 150
- when we also need our algorithm to work with map size 1500 * 1500
It may be quite different to design a optimal state/algorithm when the problem settings are different.
Yes, this is a good point. Let’s look at the big picture.
The goal of this challenge is to find efficient solutions to deal with very large environments.
For example, for 150x150 environments, operations research solutions could easily solve the problem perfectly. But they will take hours to find a solutions when the environments get larger. This is a real-world problem for logistics companies: when a train breaks down, it takes too long to find an updated schedule.
So, the goal is to find a solution which can solve environments of any size within a short computing time. We don’t necessarily want to find an optimal plan, but we want to find one that is good enough quickly! As long as you don’t have a new schedule, none of the trains can move.
So, the problems in Round 2 will be larger than in Round 1. It is also possible that we make the Round 1 environments larger at the end of the current Warm-Up Round (= at the end of the month).
Your solutions should not assume that the environments have a given maximum size, as we will make them as large as we can!
Thanks @MasterScrat for the quick reply.
I feel much clear with your reply.
As you mentioned, small map size may be better with operations search.
I am not sure if there will be test cases with small map size?
If yes, then we may need to implement an operations search algorithm, along with RL algorithm.
My question is: will you limit the minimal map size? For example, larger than K x K, ensuring that most operations search algorithm can not solve the problem in time limit. So that we can focus more on real large map size.
There will be small grids in Round 1, so people can see progress even if they can’t solve the largest environments.
In Round 2, the smallest grids will be much larger, so they will potentially become problematic for pure OR approaches.
An idea could be to combine OR and RL in a smart way, eg plan with OR as much as possible during the 5min initial planning phase, then use RL for the parts you didn’t have time to fully plan and when you have malfunctions. This way you use each method for what they are best at.