Config of simulation environment during training and evaluation

junjie_li · June 22, 2020, 6:43am

For RL to work well, it’s better to have similar configs between the simulation environment of training and evaluation.

To help properly setting up the training environment, can you provide some basic information in the evaluation environment?
For example, the range of the following settings:

width and height of map
num of trains
num of cities
type of city distribution
speed ratio of trains
max rails between cities
max rails in cities
type of schedule generator
malfunction: rate, min/max duration.

MasterScrat · June 22, 2020, 8:15am

Hello @junjie_li,

The goal of this challenge is to design a policy that is able to generalize to any kind of environment. For this reason, we don’t disclose all the details about the evaluation environments.

However, you can get some details about them:

From the FAQ (https://flatland.aicrowd.com/faq/challenge.html#what-are-the-evaluation-parameters)

The environments vary in size and number of agents as well as malfunction parameters.

For Round 1 of the NeurIPS 2020 challenge, the upper limit of these variables for submissions are:

(x_dim, y_dim) <= (150, 150)

n_agents <= 400

malfunction_rate <= 1/50

These parameters are subject to change during the challenge.

Provided test set: https://www.aicrowd.com/challenges/neurips-2020-flatland-challenge/dataset_files

This gives you an idea of the distribution of evaluation environments you will have to solve when you do a submission.

From the doc:

Speed profiles are not used in the first round of the NeurIPS 2020 challenge.

So you can just set all the trains to a speed of 1.0.

junjie_li · June 22, 2020, 9:02am

Thanks @MasterScrat for the kind reply.

May I know how much difference it may be between round 1 and round 2?

Consider the example with two different settings:

when we just need our algorithm to work with map size 150 * 150
when we also need our algorithm to work with map size 1500 * 1500

It may be quite different to design a optimal state/algorithm when the problem settings are different.

MasterScrat · June 22, 2020, 9:29am

Yes, this is a good point. Let’s look at the big picture.

The goal of this challenge is to find efficient solutions to deal with very large environments.

For example, for 150x150 environments, operations research solutions could easily solve the problem perfectly. But they will take hours to find a solutions when the environments get larger. This is a real-world problem for logistics companies: when a train breaks down, it takes too long to find an updated schedule.

So, the goal is to find a solution which can solve environments of any size within a short computing time. We don’t necessarily want to find an optimal plan, but we want to find one that is good enough quickly! As long as you don’t have a new schedule, none of the trains can move.

So, the problems in Round 2 will be larger than in Round 1. It is also possible that we make the Round 1 environments larger at the end of the current Warm-Up Round (= at the end of the month).

Your solutions should not assume that the environments have a given maximum size, as we will make them as large as we can!

junjie_li · June 22, 2020, 9:33am

Thanks @MasterScrat for the quick reply.

I feel much clear with your reply.

junjie_li · June 22, 2020, 10:23am

As you mentioned, small map size may be better with operations search.

I am not sure if there will be test cases with small map size?

If yes, then we may need to implement an operations search algorithm, along with RL algorithm.

My question is: will you limit the minimal map size? For example, larger than K x K, ensuring that most operations search algorithm can not solve the problem in time limit. So that we can focus more on real large map size.

MasterScrat · June 22, 2020, 11:20am

There will be small grids in Round 1, so people can see progress even if they can’t solve the largest environments.

In Round 2, the smallest grids will be much larger, so they will potentially become problematic for pure OR approaches.

An idea could be to combine OR and RL in a smart way, eg plan with OR as much as possible during the 5min initial planning phase, then use RL for the parts you didn’t have time to fully plan and when you have malfunctions. This way you use each method for what they are best at.