Target distribution in the Test Set - LB 0.616 with a simple magic trick

moto · May 7, 2021, 10:01am

Ha ha, I don’t know what is the magic but

If I changed the ratio of positive and negative labels in my public notebook

I got 0.616 LB.

Enjoy !!!

PS: I am unable to re-upload the notebook but you just take the one above and replace
nb_neg = nb_pos
in the cell 15 by
nb_neg = nb_pos * 2
you would got the same score.

moto · May 7, 2021, 10:18am

Other ways to handle the imbalance dataset have been shown by @Johnowhitaker in his great notebook https://www.aicrowd.com/showcase/dealing-with-class-imbalance.

Johnowhitaker · May 7, 2021, 10:53am

I just read that notebook and though hmm… I wonder what would happen if we played with those scaling values. Glad you tried it and that it worked so well!
There definitely seems like some dark ‘magic’ in this kind of problem. Maybe there is some formal theory, but when I used to do code review for another competition platform there was a lot of ‘artisanal hand-crafted tweaks’ where folks just randomly transformed probabilities or sampling fractions until things did well on the leaderboard

moto · May 8, 2021, 7:49am

@Johnowhitaker: You are talking about Kaggle I suppose. There are a lot of tricks there

Looking at the current LB, we are quite close to the top. I think I should focus on the feature engineering rather than play with some magic factors.

Johnowhitaker · May 8, 2021, 12:58pm

Zindi, but it holds for all of them
As you say, now comes the hard work

michael_bordeleau · May 8, 2021, 3:11pm

As the saying goes, there ain’t no such thing as a free lunch.

It’s time to get imaginative.