Expert demonstrations for Imitation Learning: Recreating Malfunctions

fabianpieroth · September 8, 2020, 7:43am

Hi,

I am trying to exactly recreate environments from expert demonstrations. Everything works fine, if there are no malfunctions. However, it seems that the occurrence of malfunctions is not the same in the second simulation.
During the creation of the expert demonstrations, as well as the simulation to create the RLlib format, I use the same seed.
The creation of the environments are as follows:
Expert demonstration:

env = RailEnv(
width=width,
height=height,
rail_generator=rail_generator,
schedule_generator=schedule_generator,
number_of_agents=nr_trains,
malfunction_generator_and_process_data=malfunction_from_params(
stochastic_data
),
obs_builder_object=observation_builder,
remove_agents_at_target=True,
record_steps=True,
random_seed=seed
)
obs = (
env.reset(random_seed=seed)
)

Simulation to create RLlib format:

env = RailEnv(
width=1,
height=1,
rail_generator=rail_from_file(env_file),
schedule_generator=schedule_from_file(env_file),
malfunction_generator_and_process_data=malfunction_from_file(env_file),
obs_builder_object=obs_builder_object,
random_seed=random_seed
)

    obs, info = env.reset(
        regenerate_rail=True,
        regenerate_schedule=True,
        activate_agents=False,
        random_seed=random_seed
    )

where the ‘random_seed’ is the same seed as in the expert demonstration creation.
The malfunctions are still not occurring at the same times. How can I achieve that?
Thank you for your help!

nilabha · September 12, 2020, 5:39pm

Are you using the same flatland versions for both creation and loading environments? The solution for creating the experiences in the AICrowd baselines for MARWIL and APE-X DQfD were mostly used in environments without malfunctions and they used the seed value of 1001 (https://flatland.aicrowd.com/research/baselines/imitation_learning.html).

fabianpieroth · September 14, 2020, 12:30pm

Hi @nilabha,
thank you for your answer. I reused the OR-Solution of the laster-year winning solution to create new expert data. For this, I used the same flatland version as in the imitation step. This is currently version 2.2.1, as I am waiting for the pip release of the newest flatland version.

nilabha · September 14, 2020, 1:04pm

You could also try our online RL Solution which does not require any of these intermediate steps like generating experiences. It runs everything on the fly…
You can find a pure IL and IL + PPO solution here

We haven’t documented it but we will do it soon. It uses the last year’s 2nd place solution from CkUA. Unfortunately, it was from an earlier flatland version, so as of now you have to change the malfunction behaviour as per previous versions as follows

change below line in method malfunction_from_file in the file flatland.envs.malfunction_generators.py

mean_malfunction_rate = 1/oMPD.malfunction_rate

nilabha · September 14, 2020, 1:07pm

It does it slightly different from the MARWIL/Apex-DQfD versions in that it runs every episode alternatively via IL and RL (the ratio is defaulted to 50% ratio but it can be changed and also decayed over time by changing the configs).

fabianpieroth · September 14, 2020, 3:18pm

Thank you for your suggestion! I will try to recreate the environments with malfunctions again, as soon as the pypi version of the flatland package is out. I am still happy about suggestions on how to tackle this!

Until then, I will keep using your on the fly creation.

nilabha · September 19, 2020, 5:05am

The flatland-rl version has been updated to 2.2.2. (Upgrade it using the command pip install -U flatland-rl). Can you check if the malfunctions are replicable with the same seed? Let us know if you are facing any issues.