A thought for people with frequency-severity models

simon_coulombe · March 17, 2021, 1:24am

Did you try adding the predicted frequency to the severity model?

Maybe “being unlikely to make a claim” means you only call your insurer after a disaster?

MakePredict · March 17, 2021, 3:06am

What do you mean? Loss cost = Frequency * (Severity + Frequency * some loading)? If a risk only calls for “a disaster” they should get a lower frequency estimate and a higher severity estimate assuming there are segments that you can identify like that. I would imagine they should have a higher process variance associated with their loss since there is more uncertainty related to a higher severity, so in terms of an underwriting profit provision it should be higher for them, compared with a risk that has a similar loss cost but a higher frequency and lower severity. This would mean that making a loading based on frequency would not be appropriate.

simon_coulombe · March 17, 2021, 3:09am

I mean adding a features named “predicted_frequency” to the severity model and checking if that improves the severity model.

I thought about this because back in the days I was interested in a different type of discrete-continuous model: “which type of heating system do people have in their house” and “how much gas do they use if they picked gas?” and the predicted probability for all other systems would work it’s way into the gas consumption model to correct some bias. (Dubin and McFadden 1984, don’t read it) : https://econ.ucsb.edu/~tedb/Courses/GraduateTheoryUCSB/mcfaddendubin.pdf

An example explanation then would be "if you picked gas (higher cost up front than electricity, lower cost per energy unit, so only economical when you need a lot of energy) despite having a really small house (measured) then you probably have a really crappy insulation (not measured) and will probably consume more energy than would have been predicted only from your small square footage.

In that case, the estimate of the coefficient for the relation between “square footage” and “gas consumption” would be biaised downward since all big houses get gas, but only badly insulated small houses get gas.

It’s not the same purpose, but maybe there’s some signal left.

I didnt think about this for long - this might be part of my 111 out of 112 ideas that are useless

MakePredict · March 17, 2021, 3:29am

It’s a reasonable idea. I think that in theory it’s similar to the idea behind what’s called a “double-generalized linear model”. This kind of model allows the dispersion parameter to vary for a tweedie glm based on segmentation, which is effectively the same as letting p vary by segment, and p is determined by how much of the variation in the response is driven by frequency vs. severity. Since when p is close to 2, you are assuming a distribution that is more driven by severity, and when p is closer to 1 then the variation in your aggregate loss is mostly due to the claim count distribution of the risk. So when you let this vary among risks, it accomplishes a similar idea as feeding in a frequency estimate because the model will try and predict loss cost given information about whether or not the loss for the risk is mostly frequency or severity driven.

Page 96 here describes the idea:

simon_coulombe · March 17, 2021, 3:38am

more reading on the pile - thanks for the link (and the summary) !