According to the competition page, we are evaluated on the following metrics:
- Number of laps
- Lap Time
- Number of resets to the start line
- Number of objects avoided
But I have found that amongst the agents that I have trained the faster agents (following the racing line) get scored much lower than the agents which go as slow as possible on the centerline. This observation has been consistent with all my evaluations.
Also the numbers which I see on the scoreboard are very close to the mean rewards my agents get across multiple runs. Is the scoreboard currently reflecting the mean rewards our agents are accumulating across multiple runs? Could the exact formula for calculating the score be revealed?
Is this a bug or am I missing something here?
No bug… I have same problem. It seems the score function on board round2 is diff from the round1.
Sorry I missed reply to this this post earlier.
The reward function for both the rounds is staying close to the center, this is a proxy reward but its good enough as the action space does not allow stopping. The score on the simulation leaderboard is indeed the mean cumulative reward over multiple episodes. The maximum possible reward in both rounds may vary as the tracks are different.
But the main idea of the competition is the sim2real transfer and the formula for the real track is based on waypoints crossed and time. The final prizes are based on the real track leaderboard and a small weightage on the simulation scores. I’ll make a separate post to clarify the formula for the real track.