Scoring/Metrics

david_lander · July 7, 2021, 5:30pm

Two clarifications:

Will submissions be scored on the entire test set (listed as 789 sequences)?
One false positive per 2 hours of flight is one per 60 flights, or ~13 total across the entire validation+test. This has exceptionally low statistical power and validity as a metric–if the intent is to apply a 60x penalty, why not simply subtract 60 correct encounters for each false alarm?

zontakm9 · July 9, 2021, 5:21pm

The submissions will be scored on the half of the test set, the other half is currently used as validation (current leaderboard).
The metrics require maximum of 1 false positive tracks per 2 hours of flight (every flight lasts 2 minutes). The overview of the metrics explains the rational behind this metrics.