Actually, I find the max_time_steps formula to be a bit incorrect. When I generate local tests with different number of agents and different number of cities (starting from the example from the repository), I sometimes see the simulation ending earlier than expected. After running more such tests, it seems obvious that the actual formula is:
max_time_steps = int(4 * 2 * (env.width + env.height + number_of_agents / number_of_cities))
So the last term is only 20 when the ratio of agents to cities is 20. I don’t seem to find how to get the number of cities, and I also can’t find a function which returns the number of time steps (without being passed the actual ratio agents/cities as an argument).
I would really like to know the maximum number of time steps when making decisions - can you please suggest a way to achieve this?