Hi,
I am trying to exactly recreate environments from expert demonstrations. Everything works fine, if there are no malfunctions. However, it seems that the occurrence of malfunctions is not the same in the second simulation.
During the creation of the expert demonstrations, as well as the simulation to create the RLlib format, I use the same seed.
The creation of the environments are as follows:
Expert demonstration:
env = RailEnv(
width=width,
height=height,
rail_generator=rail_generator,
schedule_generator=schedule_generator,
number_of_agents=nr_trains,
malfunction_generator_and_process_data=malfunction_from_params(
stochastic_data
),
obs_builder_object=observation_builder,
remove_agents_at_target=True,
record_steps=True,
random_seed=seed
)
obs = (
env.reset(random_seed=seed)
)
Simulation to create RLlib format:
env = RailEnv(
width=1,
height=1,
rail_generator=rail_from_file(env_file),
schedule_generator=schedule_from_file(env_file),
malfunction_generator_and_process_data=malfunction_from_file(env_file),
obs_builder_object=obs_builder_object,
random_seed=random_seed
)
obs, info = env.reset(
regenerate_rail=True,
regenerate_schedule=True,
activate_agents=False,
random_seed=random_seed
)
where the ‘random_seed’ is the same seed as in the expert demonstration creation.
The malfunctions are still not occurring at the same times. How can I achieve that?
Thank you for your help!