@robert_allaway: Thanks for pointing it out. The reason for the repetitions of the odor words in some of the sentences is because many low-frequency odor words were replaced with their parent categories in the first version of the dataset. Hence, in this case, all the repetitions can be safely treated as a single instance of the odor word in the same sentence.
In the example above, the sentence for
C=CCS can be treated simply as
In the future rounds of the competitions we will be releasing the data without the low-frequency odor word replacement.
In any case, given that each of the sentences represent a
set of the odor words, the representation do not affect the problem, or the evaluation metric computation. We will however soon update the current dataset to remove the above mentioned replications.