Launching the 3rd and Final Round of Learning to Smell Challenge ๐ŸŽ‰

Dear Participants,

We are excited to announce the launch the Final round of the Learning to Smell challenge. :tada:

These are the changes for the final Round :point_down:

  • The final round solutions are tested on an internal dataset that has been curated by our partners at Firmenich.

  • This round focusses only on a subset of smell words. Check out the list here(round3-vocabulary).

  • We expect your models to be adapted to make predictions only using the above mentioned smell vocabulary terms.

And, finally, 6,000 CHF Cash Prize Pool is up for grabs now!

All the best,
Team AIcrowd

4 Likes

Anyone who is trying to adopt their previous round codebase for the vocabulary change can use this diff:

vocabulary = set(open(self.vocabulary_path).read().split())
[...]
prediction_arr = list(map(lambda x: list(set(x) & vocabulary), prediction_arr))

Deal all,

This week-end a new paper comes using Xception + DNN โ€œdualโ€ model. Be imaginativeโ€ฆ

The dataset is available (4040 molecules) mixing Perfumery and Flavor descriptors (not like us).
You really should implement it. My only concern is the speed of Xception in keras (https://github.com/Abdulk084/Chemception), maybe Kekulescope in pytorch is faster (https://github.com/isidroc/kekulescope/).

https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c01288

An important point, authors did not give the negative (โ€œodorlessโ€) compounds. I would really suggest to add them too (but you have to go to pubchem and grab those 1229 molecules). We all know that itโ€™s very important.

Sharp โ€œdescriptorโ€ is not an olfaction term so you can remove it!

Good luck for the last round!

Guillaume

They claim 97% accuracy. Does this mean that the problem is solved?

Unfortunately 97% Acc is based on none balanced labels. itโ€™s always good to say 97% but itโ€™s not true:

  • Let say you have 1 descriptor active for 1% of the molecules and that your model only predict โ€œno activeโ€ for all molecules than your accuracy for this descriptor is > 98%! So, itโ€™s why we are asking to make top-2 / top-5 proposal. if you lose one term why not but if you always predict Floral itโ€™s not good model but โ€œAccโ€ is good.

So to answer your question, oh no itโ€™s not solved. maybe it can be useful but definitely not the solution.