Should it be NA?
Does it mean something special, like “motorbike” ?
It should not be NA. It is what existed in the data.
I don’t know for certain, but having looked at the data I would assume that it’s vehicles lighter than a certain threshold. For example, you can see that for these vehicles the existence of a second driver is more rare (as is likely the case for motorbike insurance).
In Python you can check this by:
zero_weight_policies = df[df['vh_weight'] == 0] # df is the training dataframe
print('Fraction of second driver for "zero-weight" vehicles: \t{:.2f}'.format((zero_weight_policies['drv_drv2'] == 'Yes').mean()))
print('Fraction of second driver for all vehicles: \t \t{:.2f}'.format((df['drv_drv2'] == 'Yes').mean()))
Which gives you:
# Fraction of second driver for "zero-weight" vehicles: 0.22
# Fraction of second driver for all vehicles: 0.33
So yes, a motorbike likely has that setting. But other light vehicles will also fall into this category.
1 Like