This was one of my favorite challenges so far, because the problem formulation is very simple and it attempts to get insight into one of our primal but neglected basic senses. My solution was far behind top 2 competitors, so I feel like I was missing some crucial ingredient, so I am looking forward to learn about their approach.
The core of my approach is neural net on fingerprints.

Data: union of various fingerprints extracted with
rdkit
from theSMILES
in train setfrom rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import MACCSkeys mol = Chem.MolFromSmiles(smiles) fp0 = MACCSkeys.GenMACCSKeys(mol) # MACCS keys fp1 = AllChem.GetMorganFingerprintAsBitVect(mol, 2, 256) # Morgan fingerprints fp2 = Chem.RDKFingerprint(mol) fp3 = [len(mol.GetSubstructMatch(Chem.MolFromSmarts(smarts)) > 0 for smarts in smarts_inteligands] # smarts_inteligands has about 305 smarts patterns

Preprocessing: drop constant and duplicate fingerprints

Model:
from torch import nn hidden_size = 512 dropout = .3 output_size = 75 nn.Sequential( nn.Linear(input_size, hidden_size), nn.ReLU(inplace=True), nn.Dropout(dropout), nn.BatchNorm1d(hidden_size), nn.Linear(hidden_size, hidden_size), nn.ReLU(inplace=True), nn.Dropout(dropout), nn.BatchNorm1d(hidden_size), nn.Linear(hidden_size, output_size), )

Training was done over 5 folds, each one for 25 epochs with
nn.BCEWithLogitsLoss
. The model tried to predict probabilities of 75 smells. 
The last step was to come up with 5 prediction sequences starting from individual smell probabilities. For this I sampled smells using their predicted probabilities and found the sequence with the best
jaccard
score. Then found the next sequence with the best incrementaljaccard
score and so on. 
Bells and whistles. Some of the things that made small improvements:
 label smoothing
 weighting labels for training
 weighting fingerprints based on their estimate importance
 Things that didn’t work:
 PCA on features and on labels
 UMAP on features and on labels
 pretraining on 109 labels
 continous version of IOU loss instead of BCE for training
 various learning rate schedulers
 dropping fingerprints with high correlation to others
 trying another dropout/learning rate