Two clarifications:

  • Will submissions be scored on the entire test set (listed as 789 sequences)?
  • One false positive per 2 hours of flight is one per 60 flights, or ~13 total across the entire validation+test. This has exceptionally low statistical power and validity as a metric–if the intent is to apply a 60x penalty, why not simply subtract 60 correct encounters for each false alarm?

Hi @nq1,

The submissions will be scored on the half of the test set, the other half is currently used as validation (current leaderboard).
The metrics require maximum of 1 false positive tracks per 2 hours of flight (every flight lasts 2 minutes). The overview of the metrics explains the rational behind this metrics.

