Question on the composition of the final dataset.
It says it’s around 100k policies on the 5th year. Some in the training and some new.
Looking at the numbers, Training: 60K, RMSE: 5K, 10 Weekly: 30K.
I’m wondering if the data is the 5th year of all of those policies? So the “new to you” would be from the fact that we didn’t actually ever see the data for the policies in RMSE and 10 weekly sets.
If I’m understanding correctly, the pol_sit_duration would be at least 5, because in the training data pol_sit_duration is never smaller than year?