Hi, all,
The most difficult problem in this competition is class imbalance.
Pre and post alzheimer class is too small compared with Normal.
To tackle this problem many people use downsampling like @moto.
It is good, but exploiting the rest of Normal samples should be better.
from sklearn.model_selection import KFold
kf = KFold(n_splits = (num_total_neg//num_neg))
for _, idx in kf.split(df_neg):
bagging_neg = df_neg[idx]
df_samples = pd.concat([df_pos, bagging_neg])
.........
@moto’s wonderful notebook (LB 0.616) get better(LB 0.610) with this bagging.(though I don’t compare it to seed averaging)
I hope it helps.