Submission failure: Not unqiue policy_id - year policies in submission dataset

Hi all,

After i had issues with my submissions i implemented an extrem logging (submission number: #119545).

Without changing anything on the x_raw i implemented following lines:
get length x_raw:
print(paste("raw: ",length(x_raw$year)))

and compare it with the length of unqiue id_policy - year combinations.
x_raw$id_policy_year = paste(x_raw$id_policy, x_raw$year, sep = “-”)
print(paste("unique pol_year: ",length(unique(x_raw$id_policy_year))))

As generally stated it should not happen, that a policy exist more often than once in a year.

However when looking into the log file of my submission i see following:
[1] “raw: 2000”
[1] “unique pol_year: 1999”

Meaning there is a policy_id - year combination which exists twice in the x_raw file which destroys my code.

Can you please have a look and let me know if this is a bug?

Thanks a lot and best regards.

Hi @fxs

Thanks for pointing that out!

This is possible because of the fact that some of the data your code is run on, in order to give you feedback, is synthetic data. Once that clears, we then move on to running your code on the real data set for leaderboard generation. The reason we don’t give you traceback on the real data is because of data leakage concerns within the error.

Would you be able to handle this possibility in your code? So in a case where a pol_year is duplicated, just treat them identically?

As this is not something that occurs in real life I will look into updating those synthetic rows to ensure this kind of thing does not occur in the near future.

I think i will find a solution also it would be much better to be fixed.

1 Like