I have a question regarding one of your statements in yesterday’s townhall meeting for the learning to smell challenge.
You mentioned that rearranging the SMILES can improve accuracy on tasks. I have been trying to find out a way to use this, but have not yet been successful. I have found your contribution to RDKit for this, which works fine. But now I am stuck finding a way to use these additional SMILES. Any sort of fingerprint type embedding will be the same for all of the generated SMILES, so there is nu use in extra SMILES using fingerprint embeddings. I have tried multiple different ways to represent SMILES without using any embeddings, such as by char_to_int converting with zero padding and LSTMS’s, but none are able to predict above chance level. My background is not in chemistry, so I am likely missing something quite obvious here due to my lack of domain knowledge.
Could you please point us in a direction of a type of input representation that can make use of these newly generated SMILES?
Thank you in advance.
Cas van Boekholdt