✖ Problem with `id_policy`

Hi - could someone confirm whether in predict_expected_claims() the object being provided to argument x_raw is supposed to have the id_policy column? and If so, why my code is failing as if the column is not present? It is present in the object I am attempting to left join. Does it have another name? For example see 115441 and the excerpt below from that log:

Error: Join columns must be present in data.
:heavy_multiplication_x: Problem with id_policy.

  1. ├─(function (file, local = FALSE, echo = verbose, print.eval = echo, …
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │ └─base::eval(ei, envir)
  5. ├─global::predict_expected_claim(trained_model, Xraw) predict.R:42:2
  6. │ └─%>%(…) predict_expected_claim.R:28:2
  7. ├─dplyr::mutate(…)
  8. ├─dplyr::left_join(., model[[3]], by = “id_policy”)
  9. └─dplyr:::left_join.data.frame(., model[[3]], by = “id_policy”)
  10. └─dplyr:::join_mutate(…)
  11. └─dplyr:::join_cols(...)
  12.   └─dplyr:::standardise_join_by(by, x_names = x_names, y_names = y_names)
  13.     └─dplyr:::check_join_vars(by$x, x_names)

Warning message:
no DISPLAY variable so Tk is not available

Hi @bikeactuary

I can confirm that all of the leaderboard and test data is identical in column names to the training data. So they all have id_policy and it’s in the same position and so on.

However, perhaps one thing that could be an issue is that id_policy values are not unique in the data so maybe when you left_join() R doesn’t know how to do so?

The reason they are not unique is that for each id_policy there maybe multiple values of the column year, tracking a specific policy over multiple years.

Can you confirm that this would not cause an issue?

the RHS of the join is approx. 57,000 rows, unique id_policy

The error appears to be saying a column does not exist on one or the other side of the join. This one has me pulling my hair out

OK - figured it out. stupid mistake as usual. THanks

1 Like