The most difficult problem in this competition is class imbalance.
Pre and post alzheimer class is too small compared with Normal.
To tackle this problem many people use downsampling like @moto.
It is good, but exploiting the rest of Normal samples should be better.
from sklearn.model_selection import KFold kf = KFold(n_splits = (num_total_neg//num_neg)) for _, idx in kf.split(df_neg): bagging_neg = df_neg[idx] df_samples = pd.concat([df_pos, bagging_neg]) .........
@moto’s wonderful notebook (LB 0.616) get better(LB 0.610) with this bagging.(though I don’t compare it to seed averaging)
I hope it helps.