I’m not surprised it was a lucky draw. I don’t use it, I just wanted to share how to make a submission using zip file and do a quick shout-out to the tweedie model. At the very least you’ll want to do some feature engineering, cap the outliers and find some decent hyperparameters.
Trust your CV, no need to submit to the leaderboard all the time
I tried using your xgboost parameters on my processed data with a lot of complex feature engineering, and it didn’t perform any better. I tried ensembling your xgboost with my other base models… and nothing is beating your original xgboost. Craziness…(I’m amusing myself with the RMSE leaderboard since at least it gives immediate feedback. I’m about out of ideas for the pricing strategy and none of what we tried worked well.)
A few years ago a member of my team, who is an actuary and Kaggle master, used to delight in tormenting the rest of us when his models out-performed ours on leaderboards.
He would always insist his out-performance was due to his process of selecting lucky seeds.
We all knew full well there’s no such thing, that generalises to the private leaderboard, but many a time I caught myself trying a few different seeds to see if I can get lucky and beat his model.
There is of course nothing wrong in running a few models with different seeds and taking the average result. That’s a recognised technique called bagging which will often improve a model at the cost of implementation complexity.
Its honestly not a very strong model. I’m pretty sure it was only doing fine until people caught up that they needed to increase their prices a bit.
I’ll probably adjust my profit margin for the last week, but I don’t think I’ll revisit the model too much. I’m very happy that I managed not to sink too much time on this competition and would rather keep it that way.
I have spent a lot of tiem THINKING about it, but it doesnt feel as bad