Two clarifications:
- Will submissions be scored on the entire test set (listed as 789 sequences)?
- One false positive per 2 hours of flight is one per 60 flights, or ~13 total across the entire validation+test. This has exceptionally low statistical power and validity as a metric–if the intent is to apply a 60x penalty, why not simply subtract 60 correct encounters for each false alarm?