What ID should be considered

thanish · July 6, 2020, 3:05pm

What are the IDs for which the output should be predicted, Is it the ones in the round1_sample_empty.csv?
The 7 files put together there are 716624 records with Null values, but there are only 109475 records present in the round1_sample_empty.csv?

and I also see that the IDs unique within a file but are repeated across each of the files. So which one should be considered?

shivam · July 6, 2020, 3:06pm

Hi @thanish, can you please share which challenge are you referring to?

thanish · July 6, 2020, 3:08pm

Aircraft localization challenge. I’m sorry is this a common forum? I thought I created a topic on that challenge

shivam · July 6, 2020, 3:23pm

No worries, the category name may have not added correctly for some reason, will look.

But for your query, you need to basically use round1_competition.csv.zip (download from resources section) and predict null values in it i.e. 109474 (+1 for header).

The 7 files you mentioned are part of training dataset.

~/Downloads❯ cat round1_competition.csv | grep -E "NaN,NaN" | wc -l
  109474

I hope it helps.