Round 2 is live!
There are some important changes in this Round and we are excited to see how you tackle them!
1. Round 2 submissions are code based
2. You can choose your own subset of the vocabulary now
Unlike Round 1 which was csv-based, in this round you will have to submit your full code which will be run on our evaluation infrastructure.
Each submission will have access to the following resources during evaluation :
- 4 CPU cores
- 16 GB RAM
- 1 NVIDIA K80 (optional, needs to be enabled in
All submissions will have a 10 minute setup time for loading their models, any preprocessing that they need, and then they are expected to make a single prediction in less than 1 second (per smile string).
Check out this starter kit to get stared, and make your first submission!
Choose your own vocabulary
For Round 2, you can choose a subset of the whole vocabulary(composed of 109 smell words) and create your own - if you believe that it improves your accuracy.
Read on to understand how it works
Lets define :
voc_gt: (the ground truth vocabulary) as the set of smell words in the actual challenge dataset (ground truth).
109distinct smell words as present in the training set and test set of Round-1.
voc_x: (submission vocabulary) as a subset of
voc_gt, on which participants choose to train their models on, and sample their predictions from.
voc_xhas to be composed of atleast 60 distinct smell words. This is estimated as the set of all distinct smell words used across all the predictions made by the model.
model_compression: We define the model compression as :
1 - [len(voc_x) / len(voc_gt)].
- For every
1%model compression, we expect to have an improvement in accuracy of atleast
top_2_TSS_voc_x: This refers to the
top_2_TSScomputed using the vocabulary used by the participants. When computing this metric, any smell word which is not present in
voc_xis removed from the ground truth sentences.
top_5_TSS: The Jaccard Index computed using the top-5 sentences in comparison to the ground truth (as described for Round 1 above)
top_2_TSS: The Jaccard Index computed using the top-2 sentences in comparison to the ground truth (as opposed to top 5 for top_5_tss)
top_2_TSS_voc_gt: This refers to the
top_2_TSScomputed using the vocabulary present in the ground truth data. Here, this is exactly the same as
- The adjusted scores are computed like this
if (top_5_TSS_voc_x - top_5_TSS_voc_gt) >= 0.5 * model_compression : adjusted_top_5_TSS = top_5_TSS_voc_x adjusted_top_2_TSS = top_2_TSS_voc_x else: adjusted_top_5_TSS = top_5_TSS_voc_gt adjusted_top_2_TSS = top_2_TSS_voc_gt
So, if the improvement in accuracy between
voc_gt is greater than the expected
0.5 * model_compression, then we use the improved
voc_x accuracy, else we use the original
The leaderboard is sorted based on
adjusted_top_5_TSS as the primary score, and the
adjusted_top_2_TSS as the secondary score.
During the course of Round-2, all the scores are based on 60% of the whole test data, and the final leaderboards on the whole test data will be released at the end of Round-2.