How scripts are run in weekly eval

bikeactuary · January 18, 2021, 2:05am

I just want to make sure I understand correctly… the overview says models will be evaluated over 4 years…

1. Year 1 with access to data from year 1.
2. Year 2 with access to data from years 1 - 2...

So I think this would be better worded as "Year 1 with access to data from year 0” and so on. Or in other words in year k the data passed into predict() is from year 0:k-1, is this correct?

Then I am wondering, Is the server running my scripts 4 separate times in new environments one for each year for this weekly eval? Meaning there is no loading of the .RData file I submit (only used in the running RMSE submissions), and no passing of objects into this weekly eval (aside from sourcing r functions in the .R scripts)? It is run for year 1 prediction with year 0 data provided to the fitting method, and then run again with new raw data for years 0-1, and so on?

If that is all true, am I right to assume any data or other objects put into the submitted scripts would be prevented from being sourced into the environment?

alfarzan · January 18, 2021, 10:29am

Hi @bikeactuary

Regarding the RMSE dataset access, when you are predicting year K you have access to data from years 1 - K (including year K). Hence why it’s written that way.

The code computing RMSE is shared here to clear things up a bit more as well.

Regarding the weekly evaluation
No, your code does not get reloaded several times. We only load your model once, including everything you need loaded and then we pass rows of the data to this model for prediction. Similar to an insurance company where each row represents a new customer for whom you must offer a premium price.

So any data and objects you put into the submitted scripts would be loaded as normal via the load_model function.

Does that clarify things a bit more?

bikeactuary · January 18, 2021, 1:10pm

Unfortunately it makes them murkier.

Year 1-k?
So then the fit_model script is never used on server side? Or how is it used in evaluation if so?
We pass all the model/objects loaded for prediction then?
So no reason to attach packages used for model fitting, unless they are also needed for a predict method, am I right?
It sounds then that unlike an insurance company, we cannot learn and update (e.g. either in a streaming way or iteratively each incremental year) over time on the server side. It (the profit competition) is more like an insurance company prospectively pricing a bigger portfolio for 5 years, the first 4 of those having already transpired, and the bigger prospective portfolio includes some of the same historical exposures that they have already seen/earned. Is that accurate?
So we predict in years 0:5 on eval side just to be clear? Where we trained on 0:4.
What effectively is the difference between “passing rows” of data versus passing the entire dataset? Can we do streaming learning?when and how is the response shared (claims and market win of policy) for model updating?

What am I missing here? If my new understanding above is correct, then it would follow that My model could do things like, for example, hash all policies (from training set) with no claims in those years 1-4 and offer them a $1 policy in the eval. Since the bigger portfolio includes some of those same earned exposures (meaning a policy earned over a specific time period, like year 1), I could basically guarantee I win those (unless others pursue my strategy also) and lower the bar for how well I must price the rest which I did not see in training.

alfarzan · January 18, 2021, 6:21pm

Apologies for that, I will answer point by point below:

Answers to your questions point by point

The point of the algorithm used for RMSE prediction is only to prevent the situation where your model can look at the same policy (e.g. PLXXXXX) in the future to guess if it had an accident in the past. Remember that the pol_no_claims_discount column contains this information. So when we compute RMSE for the leaderboard data, we don’t allow your model to peak into the future. We do this by making predictions 4 times on 4 different datasets each with 1 new year of history added. Then at the end we gather those predictions and compute the RMSE cumulatively.
The fit_model is there in case we need to retrain your model. It’s also used as part of the research being conducted in this challenge
For prediction of prices, we take the leaderboard data as well as the result of your load_model function and pass them both to your predict_expected_claim and predict_premium functions.
Your model needs to be entirely re-trainable. So anything that you have used for fitting needs to be included in the appropriate files.
(see my answer below for an expanded discussion).
No you just predict on year 5 on the evaluation side. The final test data consists of only 100K rows, each with a unique policy ID.
The difference between “passing rows” and passing the entire dataset in this particular case was relating to RMSE. Meaning we don’t give your model the full 4 years of history in 1 dataset. We pass them as 4 different datasets as described on the overview page. The claim_amount column is never exposed to your model in any leaderboard.

Answer to point 5
This competition aims to recreate an insurance market including some of the competitive dynamics. One way to categorise learning in an insurance company, to my understanding is:

How to better estimate claims
How to adapt to market competition

Both of these, in a real market occur simultaneously and depend on the way you perform. Year after year you get new data, retrain and so on.

In this competition most of the learning from one leaderboard to the next is isolated in point (2) since we give you all the data you would need for point (1). The idea is that you should look at your market feedback, and try to figure out how to adapt your strategy for the next week and so on.

Answer to your comment after the points

What am I missing here? If my new understanding above is correct, then it would follow that My model could do things like, for example, hash all policies (from training set) with no claims in those years 1-4 and offer them a $1 policy in the eval. Since the bigger portfolio includes some of those same earned exposures (meaning a policy earned over a specific time period, like year 1), I could basically guarantee I win those (unless others pursue my strategy also) and lower the bar for how well I must price the rest which I did not see in training.

In the final test set you only get year 5 data. This would be from 100K policies, 60K of which you have 4 years of history about.

You could hash the policies with no claims and hope that in year 5 they continue to not have a claim, this is completely fine and you can test if it’s a good idea on the RMSE leaderboard

Please let me know if things are still not clear and we’ll clarify it

bikeactuary · January 18, 2021, 7:08pm

Ok a little clearer now. I still have some quibbles with your interpretation of what you do, but I think I at least understand what you do. Thank you for your answers

"…when we compute RMSE for the leaderboard data, we don’t allow your model to peak into the future" But my model does peak into the future, because it is trained on 4 years of data which overlap with the 4 years in the RMSE eval set right? Even if those 4 years don’t contain the same policies, they are policies sourced from the same period in time, and include all of the latent information about those periods of time in the “year” column. That said, I’m not too concerned because in the end the RMSE is not the eval metric for the competition.
** fit_model is there in case we need to retrain your model.** ok, makes sense. I thought I needed to script this to anticipate running separate for year 1, 2, 3 retraining or updating with new data and so on.
Premiums can be predicted with 100% accuracy by an insurance company since the company sets that price. Only claims are predicted, but I think I follow what you’re saying. By Leaderboard data I think you mean the profit/competition side data (not RMSE board).
Ok - In order to run successfully without error? Or are you saying you want it to be and the rules say it ought to? 2 different things and the source of my original confusion. Now I think based on your reply that it does not need to be because fit_model is never run.
We have trained a model to 4 years of data, and the prediction/competition will be on year 5 exclusively (“Out of time”) with a mix of completely independent (new policies) and some repeat measure (same policies). I think I’m clear on that now?
Ok, so year 5 for the profit competition side? the 4 years are just the RMSE side…
Ok 4 years and 4 separate RMSE’s calculated for that side of it.

Sorry to be repeating myself on some things going point by point

alfarzan · January 18, 2021, 7:20pm

Yes it’s not perfect the way we do it but I think you have mostly got it!

Yes you are right your model will probably have learned things from year 4 that it can apply to year 3, so in that sense you are right. That’s ok for RMSE as far as we are concerned. But what you cannot do is to look at a specific policy in the RMSE data with year = 4, see that it’s pol_no_claims_discount value has increased since year 3, and then deduce that it had an accident in year 3. That’s what we are preventing here.
Perfect
Ah I mean for generation of prices, not prediction. This is the process for the generation of prices by your model from the server side.
It does need to run without error and that is part of the rules. So both of what you say is correct. Basically the model submit should be trained with the same code. We may retrain your model based on your code and should be able to reproduce your prices.
Yes!
Yes again
Exactly