How would the the score be calculated when the ground truth had less than 3 descriptions or more than 3 descriptions?
Should our prediction always has 3 kinds of smell in each sentence?
1 Like
Agree. That’s an important question. To test it out you could try simply uploading a file that in 1 instance has less than 3 smells in a sentence and see if it is graded or not.
It’s relevant because if the model predicts only 1 or 2 labels with relatively high probability you wouldn’t want a 3rd unlikely one added as any additional unmatched item lowers the similarity score.
currently in round one, we are asked to predict 1 to 3 tags per sentence.
we can predict up to 5 sentence.