Solution Codes and Approaches

nilabha · December 5, 2019, 9:04pm

I have put up some code here

This includes actorcritictrainer.py file which implements an actor critic approach and ESStrategyTraining.py which implements an evolutionary strategy approach.
The results seem to be similar to the Duelling Double DQN approach. I have saved sample results and pre-trained weights.
This has been done using stock observations.

Adding to Erik’s comments, my observations are

These models do not show improvement even after training for longer periods and show comparable performance, suggesting that we need to do better feature engineering.

As of now, I next plan to do some visualizations and add documentation to the code to better.

Any comments/suggestions are most welcome.

nilabha · January 4, 2020, 9:09pm

I have added another code file with a different approach that does not use a model.

The code can be found in the local Github location

For a simple demonstration of how we solve a dense railway network, simply run the file
MultipleAgentNavigationObsConflict.py.
This file does not use any additional packages other than the ones required for flatland and can be run with the latest flatland-rl version 2.1.10

RomanChernenko · January 4, 2020, 9:21pm

Hello @nilabha,

What results you have with these approaches?

nilabha · January 4, 2020, 9:41pm

I haven’t submitted it yet. But I can share the results of few envs from the local evaluation

Evaluation Number : 1
Reward : -43.83333333333334

====================================================================================================
Evaluation Number : 1
Current Env Path : ./test-envs/Test_5/Level_1.pkl
Env Creation Time : 0.42258167266845703
Number of Steps : 1120
Mean/Std of Time taken by Controller : 0.01895513470683779 0.0018288652394336735
Mean/Std of Time per Step : 0.1480530451451029 0.00961317459096298

Evaluation Number : 2
Reward : -43.41666666666668

====================================================================================================
Evaluation Number : 2
Current Env Path : ./test-envs/Test_3/Level_0.pkl
Env Creation Time : 0.29577183723449707
Number of Steps : 960
Mean/Std of Time taken by Controller : 0.02316151708364487 0.002471594250438426
Mean/Std of Time per Step : 0.14834324022134146 0.01293447105111391

Evaluation Number : 3
Reward : -52.00000000000002

====================================================================================================
Evaluation Number : 3
Current Env Path : ./test-envs/Test_6/Level_0.pkl
Env Creation Time : 1.5495717525482178
Number of Steps : 1760
Mean/Std of Time taken by Controller : 0.02436668398705396 0.0039495335690074304
Mean/Std of Time per Step : 0.19943869560956956 0.024274925544436107

Solution Codes and Approaches

Evaluation Number : 1 Reward : -43.83333333333334

Evaluation Number : 2 Reward : -43.41666666666668

Evaluation Number : 3 Reward : -52.00000000000002

Evaluation Number : 1
Reward : -43.83333333333334

Evaluation Number : 2
Reward : -43.41666666666668

Evaluation Number : 3
Reward : -52.00000000000002