Round 2 is live!
There are some important changes in this Round and we are excited to see how you tackle them!
1. Round 2 submissions are code based
2. You can choose your own subset of the vocabulary now
Code-based Submissions
Unlike Round 1 which was csv-based, in this round you will have to submit your full code which will be run on our evaluation infrastructure.
Each submission will have access to the following resources during evaluation :
- 4 CPU cores
- 16 GB RAM
- 1 NVIDIA K80 (optional, needs to be enabled in
aicrowd.json
)
All submissions will have a 10 minute setup time for loading their models, any preprocessing that they need, and then they are expected to make a single prediction in less than 1 second (per smile string).
Check out this starter kit to get stared, and make your first submission!
Choose your own vocabulary
For Round 2, you can choose a subset of the whole vocabulary(composed of 109 smell words) and create your own - if you believe that it improves your accuracy.
Read on to understand how it works
Lets define :
-
voc_gt
: (the ground truth vocabulary) as the set of smell words in the actual challenge dataset (ground truth).109
distinct smell words as present in the training set and test set of Round-1. -
voc_x
: (submission vocabulary) as a subset ofvoc_gt
, on which participants choose to train their models on, and sample their predictions from.voc_x
has to be composed of atleast 60 distinct smell words. This is estimated as the set of all distinct smell words used across all the predictions made by the model. -
model_compression
: We define the model compression as :
1 - [len(voc_x) / len(voc_gt)]
. - For every
1%
model compression, we expect to have an improvement in accuracy of atleast0.5%
. -
top_5_TSS_voc_x
,top_2_TSS_voc_x
: This refers to thetop_5_TSS
andtop_2_TSS
computed using the vocabulary used by the participants. When computing this metric, any smell word which is not present invoc_x
is removed from the ground truth sentences.-
top_5_TSS
: The Jaccard Index computed using the top-5 sentences in comparison to the ground truth (as described for Round 1 above) -
top_2_TSS
: The Jaccard Index computed using the top-2 sentences in comparison to the ground truth (as opposed to top 5 for top_5_tss)
-
-
top_5_TSS_voc_gt
,top_2_TSS_voc_gt
: This refers to thetop_5_TSS
andtop_2_TSS
computed using the vocabulary present in the ground truth data. Here, this is exactly the same astop_5_TSS
andtop_2_TSS
. - Finally,
adjusted_top_5_TSS
,adjusted_top_2_TSS
- The adjusted scores are computed like this
if (top_5_TSS_voc_x - top_5_TSS_voc_gt) >= 0.5 * model_compression :
adjusted_top_5_TSS = top_5_TSS_voc_x
adjusted_top_2_TSS = top_2_TSS_voc_x
else:
adjusted_top_5_TSS = top_5_TSS_voc_gt
adjusted_top_2_TSS = top_2_TSS_voc_gt
So, if the improvement in accuracy between voc_x
and voc_gt
is greater than the expected 0.5 * model_compression
, then we use the improved voc_x
accuracy, else we use the original voc_gt
accuracy.
The leaderboard is sorted based on adjusted_top_5_TSS
as the primary score, and the adjusted_top_2_TSS
as the secondary score.
During the course of Round-2, all the scores are based on 60% of the whole test data, and the final leaderboards on the whole test data will be released at the end of Round-2.
Cheers!