Launching the 3rd and Final Round of Learning to Smell Challenge 🎉

vrv · January 15, 2021, 3:46pm

Dear Participants,

We are excited to announce the launch the Final round of the Learning to Smell challenge.

These are the changes for the final Round

The final round solutions are tested on an internal dataset that has been curated by our partners at Firmenich.
This round focusses only on a subset of smell words. Check out the list here(round3-vocabulary).
We expect your models to be adapted to make predictions only using the above mentioned smell vocabulary terms.

And, finally, 6,000 CHF Cash Prize Pool is up for grabs now!

All the best,
Team AIcrowd

vrv · January 15, 2021, 5:08pm

shivam · January 17, 2021, 8:41pm

Anyone who is trying to adopt their previous round codebase for the vocabulary change can use this diff:

vocabulary = set(open(self.vocabulary_path).read().split())
[...]
prediction_arr = list(map(lambda x: list(set(x) & vocabulary), prediction_arr))

guillaumegodin · January 19, 2021, 6:01am

Deal all,

This week-end a new paper comes using Xception + DNN “dual” model. Be imaginative…

The dataset is available (4040 molecules) mixing Perfumery and Flavor descriptors (not like us).
You really should implement it. My only concern is the speed of Xception in keras (https://github.com/Abdulk084/Chemception), maybe Kekulescope in pytorch is faster (https://github.com/isidroc/kekulescope/).

https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.0c01288

An important point, authors did not give the negative (“odorless”) compounds. I would really suggest to add them too (but you have to go to pubchem and grab those 1229 molecules). We all know that it’s very important.

Sharp “descriptor” is not an olfaction term so you can remove it!

Good luck for the last round!

Guillaume

alarih · January 19, 2021, 2:31pm

They claim 97% accuracy. Does this mean that the problem is solved?

guillaumegodin · January 19, 2021, 2:48pm

Unfortunately 97% Acc is based on none balanced labels. it’s always good to say 97% but it’s not true:

Let say you have 1 descriptor active for 1% of the molecules and that your model only predict “no active” for all molecules than your accuracy for this descriptor is > 98%! So, it’s why we are asking to make top-2 / top-5 proposal. if you lose one term why not but if you always predict Floral it’s not good model but “Acc” is good.

So to answer your question, oh no it’s not solved. maybe it can be useful but definitely not the solution.