RMSE as an evaluation metric

lolatu2 · February 10, 2021, 4:23pm

There’s been some great discussion on the weekly profit leaderboard feedback. Figured I’d throw in a recommendation for the Claims Estimation leaderboard too. Using RMSE as a metric assumes constant variance, which isn’t true for this dataset (and is generally not true for insurance claims). Maybe use something else for claims prediction accuracy? If it’s too late for this competition, maybe something to consider in the future.

MakePredict · February 10, 2021, 7:25pm

you can optimize whatever metric you want and ignore the RMSE leaderboard. The winner is after all determined by the profit leaderboard.

lolatu2 · February 10, 2021, 7:33pm

Of course, but just thinking that there could be a better metric to help participants gauge their performance against other teams. And, you are clearly not ignoring the RMSE leaderboard

alfarzan · February 10, 2021, 11:14pm

Thanks @lolatu2!

This is a very interesting point. We chose RMSE only because it’s one of the most accessible and widely known metrics. You’re right that perhaps it’s not exactly the best insurance-specific metric.

There have been suggestions to use something like deviance or other metrics as well.

What other metrics did you have in mind?

lolatu2 · February 11, 2021, 1:42am

I’m not sure how you would implement deviance on a leaderboard, but that can be used to select our best model within a team. Maybe MAE or RMSLE? There are pros/cons to whichever metric is selected. Just something to consider I think.

simon_coulombe · February 12, 2021, 9:57pm

Gini maybe?
I remember seeing some code on kaggle for a weighted gini ( for difference exposure) written by no other than @nigel_carpenter

alfarzan · February 12, 2021, 10:01pm

Yes Gini is also one of the contenders.

I think a likely solution would be a sort of “performance” leaderboard that is sortable by the participant based on a few claim estimation metrics like that. RMSE, Gini, MAE, RMSLE and maybe something more industry standard?

Then we can set a “default” and allow participants to see different views as well.

chezmoi · March 7, 2021, 10:57pm

To be fair whichever criteria is used surely the number of parameter estimates should be part of the evaluation. For a given measure one with 10 estimates and one with a 1000 estimates which is better? Classically it should be 10 as 1,000 might be regarded as overfitting, although in the context of 1M car years both are ok I suppose However 1,000 should allow a more sophisticated pricing strategy, if the model is predictive.

chezmoi · March 7, 2021, 11:05pm

But the RMSE winner has more value. Consider any fantasy league or for that matter investment leagues. To win them you usually have to take lots of risk. For the one person who wins hundreds lose.
Would one of the losers have done better with the RMSE winner’s model? Probably.
Can the RMSE winner be the best pricing manager probably not as theory and practice are different?
Would all our approaches be different if we had to pay for our losses as well as reap profits? Yes.
Finally, as star investment managers can go from hero to zero in a year, it is worth believing that can happen in pricing too.