I wanted to discuss certain ideas on how to approach solving this challenge, anyone wants to chip in they are more than welcome to discuss on top of this -
- Combining imitation learning and reinforcement learning. One of the things I am thinking about is possibly pretraining the model using images from racetracks captured during the run of the simulator. Using this the agent can gauge the upcoming environment and probably stay on track. Multiple ways of feeding this to the agent could be based on encoding the images in such a way that’s easily interpretable for the setup.
- Knowledge distillation from multiple sensor inputs. Because the evaluation procedure only allows for a subset of the sensors that can be used for training an agent, it could be worth exploring best practices for transferring knowledge from the training sensors to the testing subset, e.g., through pre-trained encoder models.
- Another thing I am planning to try during my run with this challenge is comparing the usage of On-policy vs Off-policy agents also possible comparing and contrasting the two with the base Soft Actor-Critic being off policy an On-policy algorithm like PPO could be an interesting approach to try for this challenge.
- Also, we need to keep in mind the second part of the challenge as well which consists of the Safety aspect of the model we are using. What would be interesting is how the points mentioned above could be useful for the safety part of the challenge as well. Also to come up with a specific model that focuses more on the safety rewards would be interesting and some sort of ensemble of the fast model and the safety model could be the solution.