Hey all,
I’m sure lots of people have ideas for a future edition. I thought now might be the last chance to discuss them.
Here are mine:
 Change the way we are given the training data set so that we are always tested “in the future”. This would involve gradually feeding us a larger training set . It would look like this. Let’s say the whole dataset is 5 years and is split in folds A B C D, E and F(a policy_id is always in the same fold).
Week 1 : we train on Year1 for folds A,B, C and D. We are tested on year2 for folds A,B and E.
Week 2: same training data set, but we are tested on year2 for folds, C, D and F.
Week 3: NEW TRAINING data: we now have access to year 1 and 2 for folds A,B,C,D and we are tested on year 3 for folds A,B and E
Week 4: same training data, tested on year 3 for folds C,D and F
Week5: New training data: we now have access to year 123 for folds A,B,C,D, tested on year 4 A,B and E
Week 6: same training data, tested on year 4 for folds C,D and F.
Week 7: new training data: we now have the full training data set (year 1234) , tested on year 5 for folds A,B and E
Final WEEK : same training data, tested on year 5 of folds C, D and F.
a big con is that inactive people would need to at least refit their data on weeks 3 , 5 and 7. A solution would be to have Ali & crew refit inactive peoples on the new training set using the fit_model() functioin.

I wouldnt do a “cumulative” profit approach because a bad week would disqualify people and would make them create a new account to start from scratch, which wouldn’t be fun and also would be hell to monitor. However, a “championship” where you accumulate points like in Formula 1 could be interesting. A “crash” simply means you earn 0 point. I’d only start accumulating points about halfway through the championship so that early adopters don’t have too big of an advantage. I’d also give more points for the last week to keep the suspense.

Provide a bootstrapped estimate of the variance for the leaderboard by generating a bunch of “small test sets” sampled from the “big test set”.

Really need to find a way to give better feedback, but I can’t think of a nonhackable way.

We need to find a way that “selling 1 policy and hoping it doesnt make a claim” is no longer a strategy that can secure a spot in the top 12. A simple fix is disqualifying companies with less than 1/5 of a normal market share (in our case 10% / 5= 2%), but I’d rather find something less arbitraty.