Undersampling+bagging boosting 0.006 in LB

Hi, all,

The most difficult problem in this competition is class imbalance.
Pre and post alzheimer class is too small compared with Normal.
To tackle this problem many people use downsampling like @moto.
It is good, but exploiting the rest of Normal samples should be better.

from sklearn.model_selection import KFold
kf = KFold(n_splits = (num_total_neg//num_neg))
for _, idx in kf.split(df_neg):
    bagging_neg = df_neg[idx]
    df_samples = pd.concat([df_pos, bagging_neg])
.........

@moto’s wonderful notebook (LB 0.616) get better(LB 0.610) with this bagging.(though I don’t compare it to seed averaging)
I hope it helps.

10 Likes

Nice work @jsato.
I believe the final solution will be a big ensemble of many model types. Each model type is trained on many folds and runs.

2 Likes

Thanks @moto .
A example is here.
Vote me if you like.

1 Like

Upvoted @jsato.

BTW, do you plan to team up with someone ?

Yes, if someone wants.