Now that it’s all over and we can freely share our ideas I’m curious if anyone tried something based on non-homogeneous regression or quantile regression.
I seen that the WorldExperts team did fit a “zero inflated log normal” which to my understanding modelled a conditional variance parameter. My idea would be to use that uncertainty information in the pricing part of the process
Our team briefly tried quantile regression on the severity component. I think the model was something like frequency * 95% quantile of severity. Both the quantile model and frequency model fit by light gradient boosting
Which we then ensembled with a classicial pricing strategy of 1.15*E(claim), taking the maximum of both strategies. This gave us a very small market share and profit for week 8 and 9.
Sadly I didn’t have much time to play around with it but I feel like it might be fruitful!
If I did have time I was planning on fitting some conditional distribution, for example maybe with https://www.gamlss.com/ OR neural networks, the output layer being parameters for a distribution. Then try pricing strategies along the lines of
E(claim)+multiplier*standard deviation(claim)
I feel like this might be a nice single model way to deal with the potentially large claims.
Anyone else play around with ideas like this? I thought it would be quite a natural way to approach it.
Yes. I have tried using the standard deviation of the expected claim as a pricing factor for week 6&7. I actually thought it would be my killer for this game…
price = 1.15E(claim) + 0.01Std(claim)
However, the result was quite bad (but at that time i do not have large claim detection model). I decided not to go with this approach since week 8, only 3 weeks left and i still have a profit loading to test… cant afford to add more variables in my pricing strategy.
The implementation that I was referring to is:
Going one more step forward, there are actually two types of uncertainty,
a) enough data but the target variable itself is having high variance
b) not enough data so that the model can just extrapolate
The conditional variance measures (a) only, where for (b), i think most of us just manually add some loadings to the policies that we are not willing to take due to limited data. This can be solved by training a baynesian network (for example by tensorflow-probability), but I didnt have time to try it.
I will tidy up my code and share with everybody soon.
Nice to hear that you also thought it would be good, I was almost sure it would be the key to this!
I may have also missed some discussion on this in the town hall I had to leave after an hour.
Thanks for sharing the link. Something like this was always on my to try list but I ended up being very busy near the end of the competition. I was just going to go for something more direct and run a binary classification for frequency then fit a gamma or log-normal for the non-zeros and try pretty much exactly what you said for a pricing strategy.
Never considering going Bayesian it definitely makes sense!
I challenged this competition with a similar idea and ended up in 12th place on the leaderboard.
To predict uncertainty, I used the ngboost model, which is a kind of gradient boosting model, like xgboost and lightgbm.
Final pricing is like follows;
train lightgbm x3 and ngboost x3 (individual model has different seeds and tree parameters)
predict mean and SD from the models (mean are mainly from lightgbm, which is better in accuracy than ngboost in my validation. SD are from ngboost. )
my final submitted answer is
price = 0.7mean + 0.75SD + (some adjustment by percentile)
To be honest, until the middle of the period, I was sticking to the constant multiplier method.
Because I was late in switching to the uncertainty model, I was unable to increase market share due to insufficient consideration of pricing levels, and as a result, I was unable to make sufficient profits.
I planned to use NGBoost. Never got around to fixing our implementation for it though.
Here’s our code, we did not do well though. I didn’t put up the pricing competition we made because its’ a bit of a mess
In the end we tried a quadratic function of expected claims for pricing but I think we priced far too low.
The model for expected claims was LightGBM with Tweedie loss. I think we needed more work on the pricing competition.