Evaluation metrics

Hello,

Can you please describe the evaluation metrics of the competition? What the score and secondary score means?

Hi @RomanChernenko,

The score is RMSE and the secondary score is coverage.

cc: @masorx for confirming it, I will change the header on the leaderboard once confirmed.

2 Likes

Indeed, the principal score is RMSE on the 2D distance. There’s just one tweak - we only take the top 90% of results to calculate the score, getting rid of some potentially bad outliers.

Coverage is secondary, it’s simply showing how many of the missing locations have been predicted. Predicting all 100% may lead to worse results (or maybe not for you?). It’s generally a skill by itself to predict those locations that you are not confident in and rather exclude them (give a NaN). However, a minimum of 50% is required and we may also decide to give an award to a team that has a high coverage while maintaining a good RMSE, even if it’s not the very best.

2 Likes

Hello @shivam and @masorx

Thank you for your answers. I have a few additional questions about metrics.
Why you use 2D distance? We need to predict the height of aircraft, no?
Do you check the submission file fully after each submits? So the score on the leaderboard is final, right?

Hi Roman,

altitude is of interest but there is two main reasons not to check it here:

  1. Barometric altitude is provided through other means. The aircraft can measure it without GPS and it is long been a staple of aviation, provided by another technology than ADS-B (Mode A). That’s why you get the barometric altitude even for those aircraft where the geometric position is removed. It will be very close to the geometric altitude if you look at data where both are provided.

A decent primer on the differences of both altitude types is here: https://xcmag.com/news/gps-versus-barometric-altitude-the-definitive-answer/

  1. Traditional multilateration has been terrible with altitude. The geometric of the receivers (basically in a plane on the ground) means that algorithms can’t really deal with it. This is the baseline technology in aviation and why we wanted to keep it out for this round. We might actually want to explore this further in the future if we feel 3D works well for the competitors solutions!

Hello @masorx

What units (m or km) of a distance do you use for RMSE calculation?

The score is in meters!

Just to give some more context, super expensive, purpose-built localization solutions focused on small areas near airports often get tens to hundreds of meters of error. But to be useful it does not have to be this good, there are many purposes where 1000m may be well sufficient (and better than classic radar).

What is the ground truth you compare the contestant’s results against? Is it just the ADS-B-Location of the aircraft? (Because the ADS-B-Location is errorneous in itself.) Or did you equip the aircraft with a special high-precision “Ground-Truth-GPS-Receiver”? Or did you produce the data by simulation?

In other words, should we try to reproduce the aircraft’s internal GPS/IMU-Location it would normally transmit via ADS-B, or the true location of the aircraft?

It’s the aircraft-reported ADS-B data, which as you point out, can be erroneous in theory. We describe this in detail here: https://competition.opensky-network.org/documentation.html (link also in the competition).
Mostly, errors are small enough not to matter, the localization error will be orders of magnitudes larger. Very rare exceptions may see a large error (if using dead reckoning/inertial positioning) and we don’t have any better ground truth. Nonetheless, these need to be located in a real-world system.

1 Like