[ANNOUNCEMENT] Submission wokring for Round 2

mlerik · October 25, 2019, 3:27pm

Dear Flatlanders,

We have resolved the performance issues that kept submissions from working properly!

You can now submit your solutions using the starter kit.

Be sure to download the latest version from PyPi by running pip install -U flatland-rl.

Have fun with the challenge and feel free to reach out or share your thoughts here in the Forums.

The Flatland Team

beibei · October 25, 2019, 5:11pm

Dear Erik,

Thank you to your team for the hard work. It is great to hear that trains starts to departure again.
Just one small questions:

how many trains in one evaluation environment and how big the size of the map can the second round be? like maximum…
Is there any limited run-time for the second round like before (first round was 8 hours if my memory didn’t go wrong)?

Best,
Beibei

mlerik · October 25, 2019, 8:51pm

Hi @beibei

We are happy the train is rolling again .

The current parameter set are only with environments with max 200 agents. We introduced this due to some performance issues we were experiencing. If this changes in future we will let you know and also re-evaluate all previous submissions on the new env parameters.
There again is an upper limit on the time allowed to run. It is currently set to 12h and a time-out if nothing happens on the server for 15 minutes.

Even though performance has increased alot with the latest fixes we are still working on some further improvements. If we achieve the desired performance ther might be slight updates to time limit as well as number of agents. We will communicate this transparently when something changes.

Best regards,

Erik

lcaubert · October 26, 2019, 2:32pm

Dear Flatland-Team,

is it possible to submit for Round 2 without having participated in Round 1?

Best regards,

Lucy

mlerik · October 26, 2019, 2:57pm

Hi @lcaubert

YES!

Just use the startet kit linked above and read the official documenation to get started.

Best of luck and have fun

Best regards,

The Flatland Team

lcaubert · October 26, 2019, 4:48pm

Thx for the quick answer!

tsuneji · October 28, 2019, 6:45pm

Dear Flatland Team,

Has the timestep limit changed for the environment? Or is it still 1.5 * (H + W)?

Thanks,
Joji

mlerik · October 28, 2019, 7:17pm

Hi @tsuneji

Yes it has changed, we now allow for much more time:

max_time_steps = int(4 * 2 * (env.width + env.height + 20))

This you can find in the run.py file in the starter kit

Have fun with the challenge

Best regards,

The Flatland GTeam

mugurelionut · October 29, 2019, 10:10am

And how large can env.width and env.height be ?
Also another question, more as a clarification, to make sure I understood things correctly. Is it true that once an agent starts moving towards an adjacent cell, it won’t be able to make any other decisions until it reaches that cell? Even if reaching it may take longer than 1/speed turns (e.g. because that cell is occupied by other trains, etc.). In my local tests I’ve seen in some cases the position_fraction can increase beyond 1.0 (even a value of 1.0 can only occur if the agent can’t enter the new cell as soon as its speed allows). So I’m guessing that as long as position_fraction is strictly greater than zero, the agent can’t make any new decisions, is that correct?

mlerik · October 29, 2019, 12:26pm

Dear @mugurelionut

Thank you for reaching out, hope this helps clarify the issue:

Envs are currently never larger than (height, width) = (150, 150).
Yes agents can only make decisions on cell entry. Once they have decided and have moved beyond the enty point there is no turning back. If they however chose to stop at cell entry they will again be allowed to chose an action.

Best regards,

The Flatland Team

mugurelionut · November 10, 2019, 8:22am

Actually, I find the max_time_steps formula to be a bit incorrect. When I generate local tests with different number of agents and different number of cities (starting from the example from the repository), I sometimes see the simulation ending earlier than expected. After running more such tests, it seems obvious that the actual formula is:

max_time_steps = int(4 * 2 * (env.width + env.height + number_of_agents / number_of_cities))

So the last term is only 20 when the ratio of agents to cities is 20. I don’t seem to find how to get the number of cities, and I also can’t find a function which returns the number of time steps (without being passed the actual ratio agents/cities as an argument).

I would really like to know the maximum number of time steps when making decisions - can you please suggest a way to achieve this?

mlerik · November 10, 2019, 10:45pm

Hi @mugurelionut

The actual formula that you mentioned is correct. There is a problem when loading from files as we currently don’t store the numbers of cities in the pickle file. thus it is impossible for you to currently compute the appropriate max_time_steps for the pickle file without the associated generator parameters.
I will open an issue about this on gitlab and adress it in the coming days. Sorry for the caused inconvenience.

Best regards,
Erik

fabianpieroth · November 12, 2019, 3:38pm

Hi @mlerik,

I just wanted to add some behaviour which is related to this.

If one runs more steps on the environment and some of the agents are in a dead-lock, then the environment exits as soon as max_steps according to your formula has been reached. Additionally all information in done are set to True and the corresponding positive rewards get returned as well. This is unfortunate for training purposes.
Please correct me if I misunderstood something.

Best regards,

Fabian

mlerik · November 12, 2019, 4:04pm

Hi @fabianpieroth

The agents done=True is necessary for training to indicate that the episode terminated and thus your ML-Approach is not expecting a next observation anymore.

The returned reward should be equal to the current state of the environment. Thus if not all agents have reached their target the reward is equal to the step reward of each agent. If you need a more negative reward for agents not terminating their task in time you could do reward shaping using the env information. env.agent.status will tell you whether or not the agent has finshed it’s individual task.

Looking at the code I see that if you continue the enviromnent beyond the time that it terminated it will return the positive reward to all agents. This is a bug on our side and we will fix this.

Hope this clarifies your question.

Best regards,
Erik

mugurelionut · November 24, 2019, 8:52pm

Are there any updates about this? It would really help my approach if I knew the number of allowed time steps exactly. Unless this is not desired (estimating the number of cities can also be part of the challenge, I’d just like to know if that’s the case).

mlerik · November 26, 2019, 7:18pm

Hi @mugurelionut

Sorry for the delay. I will look into this now. Will let you know if I can give a fix to this as the levels are stored in pickle files currently and don’t contain the information about number of cities. Maybe we will have to regenerate the files with this updated information. will let you know as soon as I have fix.

Best regards,
Erik

mlerik · November 26, 2019, 7:51pm

Hi @mugurelionut

Looking at the generated files. We use the default value of 20 agents for the files used for submission scoring. I will however still check to update this to be more precise in the future. Does this help you with your submissions?

Best regards,
Erik