The current parameter set are only with environments with max 200 agents. We introduced this due to some performance issues we were experiencing. If this changes in future we will let you know and also re-evaluate all previous submissions on the new env parameters.
There again is an upper limit on the time allowed to run. It is currently set to 12h and a time-out if nothing happens on the server for 15 minutes.
Even though performance has increased alot with the latest fixes we are still working on some further improvements. If we achieve the desired performance ther might be slight updates to time limit as well as number of agents. We will communicate this transparently when something changes.
Also another question, more as a clarification, to make sure I understood things correctly. Is it true that once an agent starts moving towards an adjacent cell, it wonβt be able to make any other decisions until it reaches that cell? Even if reaching it may take longer than 1/speed turns (e.g. because that cell is occupied by other trains, etc.). In my local tests Iβve seen in some cases the position_fraction can increase beyond 1.0 (even a value of 1.0 can only occur if the agent canβt enter the new cell as soon as its speed allows). So Iβm guessing that as long as position_fraction is strictly greater than zero, the agent canβt make any new decisions, is that correct?
Thank you for reaching out, hope this helps clarify the issue:
Envs are currently never larger than (height, width) = (150, 150).
Yes agents can only make decisions on cell entry. Once they have decided and have moved beyond the enty point there is no turning back. If they however chose to stop at cell entry they will again be allowed to chose an action.
Actually, I find the max_time_steps formula to be a bit incorrect. When I generate local tests with different number of agents and different number of cities (starting from the example from the repository), I sometimes see the simulation ending earlier than expected. After running more such tests, it seems obvious that the actual formula is:
So the last term is only 20 when the ratio of agents to cities is 20. I donβt seem to find how to get the number of cities, and I also canβt find a function which returns the number of time steps (without being passed the actual ratio agents/cities as an argument).
I would really like to know the maximum number of time steps when making decisions - can you please suggest a way to achieve this?
The actual formula that you mentioned is correct. There is a problem when loading from files as we currently donβt store the numbers of cities in the pickle file. thus it is impossible for you to currently compute the appropriate max_time_steps for the pickle file without the associated generator parameters.
I will open an issue about this on gitlab and adress it in the coming days. Sorry for the caused inconvenience.
I just wanted to add some behaviour which is related to this.
If one runs more steps on the environment and some of the agents are in a dead-lock, then the environment exits as soon as max_steps according to your formula has been reached. Additionally all information in done are set to True and the corresponding positive rewards get returned as well. This is unfortunate for training purposes.
Please correct me if I misunderstood something.
The agents done=True is necessary for training to indicate that the episode terminated and thus your ML-Approach is not expecting a next observation anymore.
The returned reward should be equal to the current state of the environment. Thus if not all agents have reached their target the reward is equal to the step reward of each agent. If you need a more negative reward for agents not terminating their task in time you could do reward shaping using the env information. env.agent.status will tell you whether or not the agent has finshed itβs individual task.
Looking at the code I see that if you continue the enviromnent beyond the time that it terminated it will return the positive reward to all agents. This is a bug on our side and we will fix this.
Are there any updates about this? It would really help my approach if I knew the number of allowed time steps exactly. Unless this is not desired (estimating the number of cities can also be part of the challenge, Iβd just like to know if thatβs the case).
Sorry for the delay. I will look into this now. Will let you know if I can give a fix to this as the levels are stored in pickle files currently and donβt contain the information about number of cities. Maybe we will have to regenerate the files with this updated information. will let you know as soon as I have fix.
Looking at the generated files. We use the default value of 20 agents for the files used for submission scoring. I will however still check to update this to be more precise in the future. Does this help you with your submissions?