Hey @beibei,
the different levels in one test have the same railway settings and same number of agents. The difference is the malfunction rate.
Correct!
Does it mean within one test,
- the railway networks (maps) are same
- he initial position and target position) for agents are same
No, the railway networks and initial positions and targets are different for every level, even within the same test.
The parameters within one test are fixed (except for the malfunction rate), but each environment is still procedurally generated from these parameters, which results in different maps for each environment.
- agents will be in malfunction in different time and with different time range
The rate of malfunction changes between the different environments within the same test. The maximum rate of malfunction (per agent) is max_mf_rate = 1.0 / min_malfunction_interval = 1.0 / 250
.
You can see more in details how the malfunction rate changes within a test here: https://flatland.aicrowd.com/getting-started/environment-configurations.html#round-2
The malfunction time range is malfunction_duration = [20,50]
for all the environments in all the tests (sampled uniformly).
My another concern is about the timesteps. When I evaluate locally, there is “Evaluation finished in *** timesteps…”. Does each environment (level) still have the timestep limit? Or the score is calculated based on the done agents and the timesteps? Besides, how do you calculate the total reward on the leaderboard? Is it the sum of the normalized reward in each environment?
Each environment does have it’s own timestep limit as in Round 1, that you can get from self.env._max_episode_steps
. It is defined as int(4 * 2 * (env.width + env.height + num_agents / num_cities))
(see https://gitlab.aicrowd.com/flatland/flatland/blob/master/flatland/envs/schedule_generators.py#L188).
The score is calculated based on the done agents and the timesteps. We use the same normalized reward as in Round 1, but add 1.0 to make it between 0.0 and 1.0:
normalized_reward = 1.0 + sum_of_rewards / (self.env._max_episode_steps * self.env.get_num_agents())
And then indeed the total reward that counts for the leaderboard is the sum of the normalized reward for each environment.
You have more details here: https://flatland.aicrowd.com/getting-started/prize-and-metrics.html
And in the Round 2 announcement post: 🚂 Here comes Round 2!
And in the Round 2 environment configuration page: https://flatland.aicrowd.com/getting-started/environment-configurations.html#round-2