Baselines show improved performance

Hi All

I’m still working on the Baselines Repo, but the agents start showing more complex behaviour.

Here I use a dueling double DQN agent with the tree based observation and the shortest path predictor as explained in the baselines repo.

Any suggestions how we can improve the observation or predictor to get even better performance? Happy to discuss ideas