Hi @alexander_penkin
Yes, RMSE and the weekly leaderboards have separate datasets and there is no historical claim information about these contracts available to you.
To clarify what the leaderboard datasets look like see below.
RMSE leaderboard data
The data in this leaderboard is a uniform sample similar to your training data. It contains the data of 5K policy holders tracked over 4 years (20K contracts in total). The aim for this is to give you a general idea of how well your model performs on claim estimation.
Weekly average profit leaderboards
These are 9 weekly leaderboards using the data of approximately 15K policy holders over 4 years (60K contracts in total). Each week the leaderboard will contain a sample of these 60K contracts equally distributed among the 9 weeks.
In addition we have made it so that the year of the policies in question generally increases with each weekly leaderboard. So for example, the first weekly leaderboard this Saturday, will contain mostly contracts with year = 1
in the data while the last weekly leaderboard in late February will contain mostly contracts with year = 4
. The final test data will contain information about 100K policy holders all with year = 5
as noted previously in this thread.
I will update the over view page with some of this information shortly and please let me know if this doesn’t answer question.