Data from the future

just started to analyze the competition data and realize, that big part of them are just straight lines.
So if you would find coordinates and time difference for 2 distant endpoints for line, consisted of hundred other points of one aircraft, you could easily locate all points between them - in most cases aircraft speed near-constant.

And the main question - are any data from future test points could be used when we predict coordinates for other points? I guess future data could significantly improve results.
In real life, it is not possible, but I didn’t find any restrictions in the competition rules.
Hope organizers will clarify this moment.
I think it would be better if the winners’ solutions would help in practice.

Waiting for @masorx answer… I think that the solution must be useful in real applications, so, it is better not using future data.

  1. I agree that models need to be practical and useful

  2. It depends what is practical. For example if your task is anti-spoofing then this is right idea to see all airplane track

  3. That is incorrect to add significant details to rules in the final stage of the competition.

  4. Train data that we have includes “data from future” and obviously all use it.

  5. It will be practical if model should be limited by the data that was given by organizers in data archive.


Hi John,

Excellent question and one we discussed internally before the competition. In short, we are well aware of this, it is fine for this round to use all given data without restrictions and as mentioned, it wouldn’t be fair to change the rules after several weeks. However, we are definitely considering future rounds with such a premise.

There are use cases where we’d analyse historical data just like the data given and can use everything available, @vitaly_bondar has named one, OSINT would be another one. Thus, live tracking is not everything and there is utility in doing it after the fact. On the other hand, live tracking is very important in other fields and applicable solutions do have higher value there.

Scientifically, it will be interesting to analyse the different approaches and see which ones are best for which use case. This is why we require the code to be open sourced for the award recipients. Then we can compare and evaluate.

Finally, part of what you write is actually applicable for live tracking, too: the best prediction for the next data point most of the time would be that aircraft heading & speed are staying constant.


1 Like

Thank you for clarification.

I participated in some Kaggle competitions where winners used data leaks from the future. Not sure if organizers were happy with top score solutions with data leaks.

Hope this time you will get something useful, but for me, tasks with leaks are not so attractive.


Hi Evgeny,

That’s totally understandable! Hope to have you onboard in a future round/competition!

Out of interest, since in this setting it is impossible to not provide any future data in a meaningful way (since tracking, as opposed to point-by-point prediction is explicitly desired), would you be happy with a simple rule against the use? We can only check after the fact if contestants adhered to such a rule.

Hello @masorx

Just to add the rule to prohibit a future data usage is not enough in general. Always possible to implement a method that solves the offline version of the problem with all future data and then finetune the official online method with predicted data at hidden offline realization as “ground-truth”.
If you really want to solve an online tracking problem, you should invent some method how to strictly hide future data, like at the flatland challenge. But it required something like kernel-competitions at Kaggle.

1 Like

Hi masorx!

I understand the problem that impossible to hide future data and think that rules against usage future data could be good idea. Of course, you would need to spend more time investigating top solutions, but it increases the chances to get a practical solution.
Code competition with hidden test is also would help - it could not prevent at all usage future data but at least make it harder to deep analyze test data.

Hope you would implement some restrictions for next competitions. We all interested in practical implementation of DS solutions after competitions.

1 Like

Since we have a lot of test data, we will certainly test the provided solutions for this round on them (although this won’t change the winners) but we will consider implement hidden tests for the next rounds.

1 Like