I have made an open submission that could serve as a baseline for anyone wanting to get bootstrapped into the competition.
There are pre-trained weights in the repository which have a decent score in the last submission.
The baseline implementation is intentionally minimal, but the base algorithm (PPO) is fairly advanced and very popular. There are many opportunities for someone to extend the algorithm or architecture. It could also use some hyperparameter tuning, reward function shaping, and a well designed training procedure. Additional information and some possible directions for improvement can be found in the project README.md.
I will provide additional information on the details if there is interest. As other participants have higher scoring submissions this baseline implementation will be also be enhanced. Please consider sharing your extensions or at least a comparison to this baseline with the community.