Hello,
Can you please describe the evaluation metrics of the competition? What the score and secondary score means?
Hello,
Can you please describe the evaluation metrics of the competition? What the score and secondary score means?
Hi @RomanChernenko,
The score is RMSE and the secondary score is coverage.
cc: @masorx for confirming it, I will change the header on the leaderboard once confirmed.
Indeed, the principal score is RMSE on the 2D distance. There’s just one tweak - we only take the top 90% of results to calculate the score, getting rid of some potentially bad outliers.
Coverage is secondary, it’s simply showing how many of the missing locations have been predicted. Predicting all 100% may lead to worse results (or maybe not for you?). It’s generally a skill by itself to predict those locations that you are not confident in and rather exclude them (give a NaN). However, a minimum of 50% is required and we may also decide to give an award to a team that has a high coverage while maintaining a good RMSE, even if it’s not the very best.
Thank you for your answers. I have a few additional questions about metrics.
Why you use 2D distance? We need to predict the height of aircraft, no?
Do you check the submission file fully after each submits? So the score on the leaderboard is final, right?
Hi Roman,
altitude is of interest but there is two main reasons not to check it here:
A decent primer on the differences of both altitude types is here: https://xcmag.com/news/gps-versus-barometric-altitude-the-definitive-answer/
The score is in meters!
Just to give some more context, super expensive, purpose-built localization solutions focused on small areas near airports often get tens to hundreds of meters of error. But to be useful it does not have to be this good, there are many purposes where 1000m may be well sufficient (and better than classic radar).
What is the ground truth you compare the contestant’s results against? Is it just the ADS-B-Location of the aircraft? (Because the ADS-B-Location is errorneous in itself.) Or did you equip the aircraft with a special high-precision “Ground-Truth-GPS-Receiver”? Or did you produce the data by simulation?
In other words, should we try to reproduce the aircraft’s internal GPS/IMU-Location it would normally transmit via ADS-B, or the true location of the aircraft?
It’s the aircraft-reported ADS-B data, which as you point out, can be erroneous in theory. We describe this in detail here: https://competition.opensky-network.org/documentation.html (link also in the competition).
Mostly, errors are small enough not to matter, the localization error will be orders of magnitudes larger. Very rare exceptions may see a large error (if using dead reckoning/inertial positioning) and we don’t have any better ground truth. Nonetheless, these need to be located in a real-world system.