Target distribution in the Test Set - LB 0.616 with a simple magic trick

Ha ha, I don’t know what is the magic but

If I changed the ratio of positive and negative labels in my public notebook

I got 0.616 LB.

Enjoy !!!

PS: I am unable to re-upload the notebook but you just take the one above and replace
nb_neg = nb_pos
in the cell 15 by
nb_neg = nb_pos * 2
you would got the same score.


Other ways to handle the imbalance dataset have been shown by @Johnowhitaker in his great notebook


I just read that notebook and though hmm… I wonder what would happen if we played with those scaling values. Glad you tried it and that it worked so well!
There definitely seems like some dark ‘magic’ in this kind of problem. Maybe there is some formal theory, but when I used to do code review for another competition platform there was a lot of ‘artisanal hand-crafted tweaks’ where folks just randomly transformed probabilities or sampling fractions until things did well on the leaderboard :joy:


@Johnowhitaker: You are talking about Kaggle I suppose. There are a lot of tricks there :wink:

Looking at the current LB, we are quite close to the top. I think I should focus on the feature engineering rather than play with some magic factors.

Zindi, but it holds for all of them :joy:
As you say, now comes the hard work :slight_smile:


As the saying goes, there ain’t no such thing as a free lunch. :upside_down_face:

It’s time to get imaginative. :man_mechanic: :man_factory_worker: :man_farmer: :clown_face: