Current status of imitation agent in baseline repository

milva · October 16, 2020, 11:15am

Greetings, dear organisers!
I have been trying to utilise the imitation agent resources in the baseline repository, especially imitation_agent/imitation_trainer.py and baselines/custom_imitation_learning_rllib_tree_obs/*.yaml s.
However, the imitation_trainer.py looks not working or fully implemented, at least for now. I can’t even find any usage of graph-related codes in /libs though they apparently look like existing for generating expert demonstration.
So, what’s the current status of this? Is there any way to exploit these codes?
Thanks.

MasterScrat · October 16, 2020, 12:40pm

Hey @milva,

So, we have some beautiful imitation learning machinery, with the ability to generate and persist expert demonstrations from top OR submissions, and also with the ability to figure out expert demonstrations on-the-fly (ie no need to create an expert demonstration dataset, you can just compute the best action dynamically). And there’s also a script to convert all that to RLlib format so you can scale up training.

Sadly, though, all this went through multiple versions and is very poorly documented as of right now! We’re aware of it and will try to improve this aspect as soon as we can…

You can maybe get some help from here: Recreating Malfunctions

Also if you tell us more precisely what you are trying to do (pure IL with RLlib?) we may be able to nudge you in the right direction in the meantime.

nilabha · October 16, 2020, 12:57pm

The imitation trainer works. We have generated results for them. You could do a training and simultaneous evaluation using the script
train.py -ief baselines/custom_imitation_learning_rllib_tree_obs/ppo_imitation_tree_obs.yaml --eager --trace
(drop -e flag if you don’t want to do evaluation)
The only thing is the OR expert solution uses was for an older flatland version where the malfunction rate was different. So if you are training with malfunctions, you can workaround it by doing the below changes in the flatland source code

change below line in method malfunction_from_file in the file flatland.envs.malfunction_generators.py

mean_malfunction_rate = 1/oMPD.malfunction_rate

The documentation here https://flatland.aicrowd.com/research/baselines/imitation_learning.html is a bit old , we will update it soon.
You can refer to this Google Colab notebook also which has the details along with the results https://colab.research.google.com/drive/1oK8yaTSVYH4Av_NwmhEC9ZNBS_Wwhi18#scrollTo=P_IMrdL27Ii7
Let me know if you are facing any issues.

milva · October 16, 2020, 1:20pm

Thanks for your thorough replies @MasterScrat @nilabha.
A current goal is a mixed approach that uses IL as a baseline and enhances that with other RL methods.
The --eager argument works like a charm. It didn’t work and threw TF errors without it.

Thanks a lot!

edit) So I should change the formula if I want to use the imitation trainer. Right?

nilabha · October 16, 2020, 3:49pm

The above script was for doing PPO and IL alternately…

If you want a pure IL, you can try
train.py -ef baselines/custom_imitation_learning_rllib_tree_obs/pure_imitation_tree_obs.yaml --eager --trace