Ideas for an eventual next edition

Hey all,

I’m sure lots of people have ideas for a future edition. I thought now might be the last chance to discuss them.

Here are mine:

  1. Change the way we are given the training data set so that we are always tested “in the future”. This would involve gradually feeding us a larger training set . It would look like this. Let’s say the whole dataset is 5 years and is split in folds A B C D, E and F(a policy_id is always in the same fold).
    Week 1 : we train on Year1 for folds A,B, C and D. We are tested on year2 for folds A,B and E.
    Week 2: same training data set, but we are tested on year2 for folds, C, D and F.
    Week 3: NEW TRAINING data: we now have access to year 1 and 2 for folds A,B,C,D and we are tested on year 3 for folds A,B and E
    Week 4: same training data, tested on year 3 for folds C,D and F
    Week5: New training data: we now have access to year 1-2-3 for folds A,B,C,D, tested on year 4 A,B and E
    Week 6: same training data, tested on year 4 for folds C,D and F.
    Week 7: new training data: we now have the full training data set (year 1-2-3-4) , tested on year 5 for folds A,B and E
    Final WEEK : same training data, tested on year 5 of folds C, D and F.

a big con is that inactive people would need to at least refit their data on weeks 3 , 5 and 7. A solution would be to have Ali & crew refit inactive peoples on the new training set using the fit_model() functioin.

  1. I wouldnt do a “cumulative” profit approach because a bad week would disqualify people and would make them create a new account to start from scratch, which wouldn’t be fun and also would be hell to monitor. However, a “championship” where you accumulate points like in Formula 1 could be interesting. A “crash” simply means you earn 0 point. I’d only start accumulating points about halfway through the championship so that early adopters don’t have too big of an advantage. I’d also give more points for the last week to keep the suspense.

  2. Provide a bootstrapped estimate of the variance for the leaderboard by generating a bunch of “small test sets” sampled from the “big test set”.

  3. Really need to find a way to give better feedback, but I can’t think of a non-hackable way.

  4. We need to find a way that “selling 1 policy and hoping it doesnt make a claim” is no longer a strategy that can secure a spot in the top 12. A simple fix is disqualifying companies with less than 1/5 of a normal market share (in our case 10% / 5= 2%), but I’d rather find something less arbitraty.


Great idea! Having a rolling training set and testing set is more realistic. Also it is more exciting for the participants as they would have a new dataset to play with every 2 weeks.

My some other thoughts:
It is better for the RMSE leaderboard to have the same format as the final scoring round. I understand that claim_amount column is dropped in the RMSE leaderboard to avoid probing, but this column is actually available in the final round. This caused some confusions.
To make the weekly feedback more informative, maybe showing a distribution curve is better than just a mean. This can help us to know whether we are targeting the right segments.


If we’re throwing out ideas for future competitions-
A possibility might be to split the competition in two; a modelling competition on real data and a pricing competition on simulated data. In the modelling competition, competitors would try to fit real data that has been carefully anonymised. The best entries for the modelling competition could then be used in the pricing competition. For each simulated market in the , one of the best-fitting models could be drawn at random (maybe with bootstrap sampling to add extra noise while still being somewhat realistic) and be used to generate a fake dataset for multiple years. Each of the entries in the pricing competition could use the fake dataset (before the last year) as inputs to its pricing code and the last years fake data could be used to evaluate the pricing policies. This would allow much more flexibility as anonymisation and finite datasets would only be an issue in the modelling competition.
There are a couple of obvious downsides. It would be very computationally intensive to repeatedly generate a fake dataset and then allow multiple entries to fit their own internal model before setting prices. Also there is the possibility that the top entries exploit some artifact of the models used to generate the fake data. This wouldn’t really be a productive research outcome as no real world conclusions could be drawn.

1 Like