Round 2 is live!
There are some important changes in this Round and we are excited to see how you tackle them!
1. Round 2 submissions are code based
2. You can choose your own subset of the vocabulary now
Code-based Submissions
Unlike Round 1 which was csv-based, in this round you will have to submit your full code which will be run on our evaluation infrastructure.
Each submission will have access to the following resources during evaluation :
- 4 CPU cores
- 16 GB RAM
- 1 NVIDIA K80 (optional, needs to be enabled in
aicrowd.json)
All submissions will have a 10 minute setup time for loading their models, any preprocessing that they need, and then they are expected to make a single prediction in less than 1 second (per smile string).
Check out this starter kit to get stared, and make your first submission!
Choose your own vocabulary
For Round 2, you can choose a subset of the whole vocabulary(composed of 109 smell words) and create your own - if you believe that it improves your accuracy.
Read on to understand how it works 
Lets define :
-
voc_gt: (the ground truth vocabulary) as the set of smell words in the actual challenge dataset (ground truth).109distinct smell words as present in the training set and test set of Round-1. -
voc_x: (submission vocabulary) as a subset ofvoc_gt, on which participants choose to train their models on, and sample their predictions from.voc_xhas to be composed of atleast 60 distinct smell words. This is estimated as the set of all distinct smell words used across all the predictions made by the model. -
model_compression: We define the model compression as :
1 - [len(voc_x) / len(voc_gt)]. - For every
1%model compression, we expect to have an improvement in accuracy of atleast0.5%. -
top_5_TSS_voc_x,top_2_TSS_voc_x: This refers to thetop_5_TSSandtop_2_TSScomputed using the vocabulary used by the participants. When computing this metric, any smell word which is not present invoc_xis removed from the ground truth sentences.-
top_5_TSS: The Jaccard Index computed using the top-5 sentences in comparison to the ground truth (as described for Round 1 above) -
top_2_TSS: The Jaccard Index computed using the top-2 sentences in comparison to the ground truth (as opposed to top 5 for top_5_tss)
-
-
top_5_TSS_voc_gt,top_2_TSS_voc_gt: This refers to thetop_5_TSSandtop_2_TSScomputed using the vocabulary present in the ground truth data. Here, this is exactly the same astop_5_TSSandtop_2_TSS. - Finally,
adjusted_top_5_TSS,adjusted_top_2_TSS- The adjusted scores are computed like this
- The adjusted scores are computed like this
if (top_5_TSS_voc_x - top_5_TSS_voc_gt) >= 0.5 * model_compression :
adjusted_top_5_TSS = top_5_TSS_voc_x
adjusted_top_2_TSS = top_2_TSS_voc_x
else:
adjusted_top_5_TSS = top_5_TSS_voc_gt
adjusted_top_2_TSS = top_2_TSS_voc_gt
So, if the improvement in accuracy between voc_x and voc_gt is greater than the expected 0.5 * model_compression, then we use the improved voc_x accuracy, else we use the original voc_gt accuracy.
The leaderboard is sorted based on adjusted_top_5_TSS as the primary score, and the adjusted_top_2_TSS as the secondary score.
During the course of Round-2, all the scores are based on 60% of the whole test data, and the final leaderboards on the whole test data will be released at the end of Round-2.
Cheers!