This includes actorcritictrainer.py file which implements an actor critic approach and ESStrategyTraining.py which implements an evolutionary strategy approach.
The results seem to be similar to the Duelling Double DQN approach. I have saved sample results and pre-trained weights.
This has been done using stock observations.
Adding to Erik’s comments, my observations are
These models do not show improvement even after training for longer periods and show comparable performance, suggesting that we need to do better feature engineering.
As of now, I next plan to do some visualizations and add documentation to the code to better.
I have added another code file with a different approach that does not use a model.
The code can be found in the local Github location
For a simple demonstration of how we solve a dense railway network, simply run the file
MultipleAgentNavigationObsConflict.py.
This file does not use any additional packages other than the ones required for flatland and can be run with the latest flatland-rl version 2.1.10
I haven’t submitted it yet. But I can share the results of few envs from the local evaluation
Evaluation Number : 1
Reward : -43.83333333333334
====================================================================================================
Evaluation Number : 1
Current Env Path : ./test-envs/Test_5/Level_1.pkl
Env Creation Time : 0.42258167266845703
Number of Steps : 1120
Mean/Std of Time taken by Controller : 0.01895513470683779 0.0018288652394336735
Mean/Std of Time per Step : 0.1480530451451029 0.00961317459096298
Evaluation Number : 2
Reward : -43.41666666666668
====================================================================================================
Evaluation Number : 2
Current Env Path : ./test-envs/Test_3/Level_0.pkl
Env Creation Time : 0.29577183723449707
Number of Steps : 960
Mean/Std of Time taken by Controller : 0.02316151708364487 0.002471594250438426
Mean/Std of Time per Step : 0.14834324022134146 0.01293447105111391
Evaluation Number : 3
Reward : -52.00000000000002
====================================================================================================
Evaluation Number : 3
Current Env Path : ./test-envs/Test_6/Level_0.pkl
Env Creation Time : 1.5495717525482178
Number of Steps : 1760
Mean/Std of Time taken by Controller : 0.02436668398705396 0.0039495335690074304
Mean/Std of Time per Step : 0.19943869560956956 0.024274925544436107