How are you doing?

Hey there!

This far into the competition, and having some time off, I want to initiate a discussion, so that this forum is more than just:

  • debugging issues
  • bugging @alfarzan about evaluation metrics/leaderboard … :smile:



… Are you overlooking it? :see_no_evil:

The first part of this competition is a relatively normal modeling challenge; tabular, good exposure, a few handfuls of variable.

And then there’s a 2nd portion that I believe most might overlook… the pricing strategy.

I know I am… …guilty! :robot:



Creating my own internal leaderboard :chart:

Having several models on hands, today I tried to actually look at pricing, instead of simply RMSEs.

  • 15 models, with diverse algorithms,
  • 68.5k test policies,
  • lowest price wins.

Calculated profit, loss_ratio, closing_ratio and avg_premium.


Results :trophy:

First off, interesting to see how spreaded was the market share. (closing ratio).
There’s no single clear winner..

Also very interesting to see some models winning significantly lower avg_premiums (model #13) (on significant volume) . Tells me it’s larger risk pricing is not competitive at all.

But can I really take anything away from this? … this is a just sample, and limited data to create enough folds.
…ugh . :zap: randomness in insurance is challenging…


And you? :wave:

How are you feeling these days… Are you overlooking the pricing?
Or have you all been developing price optimization, simulating markets and competitor prices? :chart_with_upwards_trend: :chart_with_downwards_trend:


Side note:

I’m guilty of having spent way too much time on this project… :mantelpiece_clock:
I’m curious about others? Were you caught in more than you would’ve expected?


Wow, this is great. Thanks so much for sharing! Blurring the model names looks pretty cool. I think we can mostly guess what they are :slight_smile:, but I assume that’s your intent.

I totally agree that most teams probably focused way more on the loss accuracy model (my team is definitely guilty of this). We actually did something very similar to you this week, though with fewer models but we also tried sampling different portions of the training to run the profit simulations through. I agree that it’s still difficult to figure out what to do for a pricing strategy.

The thing is though, if this type of simulation is able to lead to some insights, then I think we should have seen more consistency in the weekly profit rankings. Since we haven’t, it makes me question if there really is a winnable pricing strategy? Whether the answer is yes or no, it’s super interesting.

What seems to be clear from the leaderboards is that RMSE isn’t as correlated to weekly profit as maybe most would have expected. Beyond that, I’m not sure if anyone is feeling confident about their pricing strategy…

I’ve also spent way too much time on this project… but it’s been a great learning experience.


Thanks for the insight! I feel bad scrolling down the forums and seeing everything is just a debugging post.

Yea so far our team has been completely overlooking the pricing aspect. Spending all our time on getting the rmse leaderboard higher and making our code easy to work with so we can run these experiments in future.

We haven’t even got to the stage of setting up these market competitions yet! It is definitely on our radar that we have to do it and figure out the best way to evaluate pricing strategies for the competition if we want to be successful

This is our 3rd week of it and we’ve been doing pretty poorly on the profit leaderboard… this is no surprise to me!

I’d love to see a plot of rmse vs avrg profit for each week after the competition.


So, since @simon_coulombe was so generous with his loss model, here’s the week 7 submission RMSE versus profit ranking plot :smiley: (the two charts are based on the same data, but the x-range has been adjusted):

I also scraped the submissions for the other weekly profit leaderboards. Here’s the CSV. Hoping someone does a cool analysis with it :slight_smile:.


Thanks for sharing the model comparison, it gives a good idea about how different approaches can give very different results even if they all appear to be giving reasonably good fit to the data. These are my current thoughts on the pricing part of the problem, for what they’re worth - it not exactly a strategy, but more of a set of considerations when designing a strategy:

  1. The need for a loading to your model output: if everyone submitted prices from their models without applying any loadings then assuming everyones models are producing estimates that are on average similar to the test data the winning price for any individual policy is likely to be lower the average.You are more likely to win policies where your model under estimates the cost (and so will make a loss) and you are less likely to win a policy where you model accurately predicts claims, or where it over estimates the cost, because some other person is likely to have a model that produces a lower estimate. A loading needs to be applied to your estimated cost of claims to allow for this, or losses are almost certain. The loading needs to be high enough to overcome ‘the winners curse’, but not so high that you end up with no policies.

  2. The loading should be different for different policies:
    Regardless of what model you use, the estimated level of claims on any policy is informed by the claims on similar policies and on examining how differences in policies lead to differences in claims. The more data you have on policies and their similarity or differences the better the model is likely to be at estimating. Any model is likley to perform worse in areas of the data where there are few other policies that are similar to the one you are trying to estimate. The range of estimates different models come up with is likely to be wider when there is less relevant data to base the estimate on, so the risk of winning a policy which is significantly underpriced is higher.

  3. Better claims models matter
    In the case of a market with two insurers where one insurer charges the same price for all risks and the second sets prices just on the average of past claims based on vehicle value, the winners curse will be more like to impact the first insurer more than the second. The first will win all the high value vehicle policies at a price which is too low, and won’t win any of the low value vehicles where is price is too high. The second insurer may be able to win the policies on lower value vehicles but won’t make much of a profit. The same logic applies when the models get more complicated. Having a better model lets you pick the better risks at a better prices

  4. Your competitors price matters
    Going back to the last example, the second insurer is losing out on potential profit by charging a price significantly below the first insurer for low value vehicles. Really they only need to charge a little less than their competitor to win the policy, anything lower than that reduces their profitability.

5)Randomness matters:
A small number of large claims account for a significant part of the total claims cost. If you have a cheaper price for these policies compared to others you will be more likely to have poor results. To some extent large claims are more likely to occur on some policies than others, but they still occur at random. If a loading is too high your portfolio will probably be too smal to absorb the cost of random large claims when they occur. Randomness can also work in your favour if you set a price which includes an allowance for large claims, but you end up with fewer large claims than expected.

A winning pricing strategy is likely one that is based on a good claims model that uses data the best way to form good estimates of losses, applies appropriate loadings to offset model deficiencies and the winners curse, and manages the random element well. There isn’t much detailed feedback on how your price compares to your competitors, so that part is probably the hardest part to manage. Getting the balance right on all of these factors is not easy.

The top position on the final leaderboard is probably going to be a bit random (particularly if there are some really big claims in the final data set), but the real prize is mainly what we learn along the way. I know I’ve learned alot with this challenge, so thanks to the organisers for all their efforts setting this up and running it.


only one simple comment from my side: wayyyyyyyyyyyyy too much time spent on this project :smiley: but indeed it was very didactical - in fact we are very happy about all the analysis/findings we did! now let’s see how eventually we will end up the competition, to see if all the hours spent can go to the bin or not :wink:

1 Like

I just joined. I am trying to brush up on R and am at the moment analysing data in Excel. Still need to build GLM,


So also need to fit models to data - decide on final model and then add it to the data. then decide on a premium per policy with some additional loading.