What are the IDs for which the output should be predicted, Is it the ones in the round1_sample_empty.csv?
The 7 files put together there are 716624 records with Null values, but there are only 109475 records present in the round1_sample_empty.csv?
and I also see that the IDs unique within a file but are repeated across each of the files. So which one should be considered?
No worries, the category name may have not added correctly for some reason, will look.
But for your query, you need to basically use round1_competition.csv.zip (download from resources section) and predict null values in it i.e. 109474 (+1 for header).
The 7 files you mentioned are part of training dataset.