Thanks, I think it works.
But the evaluator is messaging:
$ flatland-evaluator --tests ./scratch/test-envs/ --shuffle False
…
Evaluating Test_0/Level_0.pkl (0/28)
DEPRECATED - RailEnv arg: malfunction_and_process_data - use malfunction_generator
Percentage for test 0, level 0: 0.6
…
$ python run.py
/home/user/anaconda3/envs/flatland-rl/lib/python3.6/site-packages/torch/serialization.py:649: SourceChangeWarning: source code of class 'torch.nn.modules.linear.Linear' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
DEPRECATED - RailEnv arg: malfunction_and_process_data - use malfunction_generator
Env Path : ./scratch/test-envs/Test_0/Level_0.pkl
...
… and singe_agent training:
python reinforcement_learning/single_agent_training.py
…
home/user/anaconda3/envs/flatland-rl/lib/python3.6/site-packages/flatland/envs/rail_generators.py:781: UserWarning: Could not set all required cities!
“Could not set all required cities!”)
/home/user/anaconda3/envs/flatland-rl/lib/python3.6/site-packages/flatland/envs/rail_generators.py:703: UserWarning: [WARNING] Changing to Grid mode to place at least 2 cities.
warnings.warn("[WARNING] Changing to Grid mode to place at least 2 cities.")
…
Training 1 agents on 25x25 Episode 0 Average Score: -0.998 Dones: 0.00% Epsilon: 1.00 Action Probabilities: [0.19451372 0.19451372 0.19201995 0.18204489 0.23690773]
/home/user/anaconda3/envs/flatland-rl/lib/python3.6/site-packages/flatland/envs/rail_generators.py:781: UserWarning: Could not set all required cities!
“Could not set all required cities!”)
/home/user/anaconda3/envs/flatland-rl/lib/python3.6/site-packages/flatland/envs/rail_generators.py:703: UserWarning: [WARNING] Changing to Grid mode to place at least 2 cities.
warnings.warn("[WARNING] Changing to Grid mode to place at least 2 cities.")
Training 1 agents on 25x25 Episode 100 Average Score: -0.448 Dones: 78.00% Epsilon: 0.74 Action Probabilities: [0.20166263 0.2048007 0.20314909 0.20204801 0.18833957]
Training 1 agents on 25x25 Episode 200 Average Score: -0.322 Dones: 92.00% Epsilon: 0.55 Action Probabilities: [0.16372086 0.21758342 0.22398293 0.20844126 0.18627152]
Training 1 agents on 25x25 Episode 300 Average Score: -0.280 Dones: 94.00% Epsilon: 0.40 Action Probabilities: [0.14407814 0.21219257 0.22605965 0.19928484 0.21838479]
Training 1 agents on 25x25 Episode 400 Average Score: -0.260 Dones: 94.00% Epsilon: 0.30 Action Probabilities: [0.1091662 0.23197817 0.20431018 0.2002635 0.25428195]
Training 1 agents on 25x25 Episode 499 Average Score: -0.220 Dones: 96.00% Epsilon: 0.22 Action Probabilities: [0.10974794 0.2326567 0.17332144 0.21012715 0.27414678]
Maybe something is still misconfigured?
I had a learning curve output, but no behavior view.