What does vh_weight == 0 mean?

Should it be NA?
Does it mean something special, like “motorbike” ?

Hi @simon_coulombe

It should not be NA. It is what existed in the data.

I don’t know for certain, but having looked at the data I would assume that it’s vehicles lighter than a certain threshold. For example, you can see that for these vehicles the existence of a second driver is more rare (as is likely the case for motorbike insurance).

In Python you can check this by:

zero_weight_policies = df[df['vh_weight'] == 0]  # df is the training dataframe
print('Fraction of second driver for "zero-weight" vehicles: \t{:.2f}'.format((zero_weight_policies['drv_drv2'] == 'Yes').mean()))
print('Fraction of second driver for all vehicles: \t \t{:.2f}'.format((df['drv_drv2'] == 'Yes').mean()))

Which gives you:

# Fraction of second driver for "zero-weight" vehicles: 0.22
# Fraction of second driver for all vehicles: 	 	  0.33

So yes, a motorbike likely has that setting. But other light vehicles will also fall into this category.

1 Like