Task1 has query/product pairs with empty documents both in training and test datasets

Hi all, I found some problems with the datasets for Task 1.

1st: Training data has annotated query/product pairs, whose products are empty in all fields.


image

2nd: The same is observed on the test set which is the major problem, as we have no basis on how to rank them.


image

3rd: Back on the training set, there are some Korean, Chinese and Arabic queries marked as English.

Let me know if you’re not able to reproduce and I’ll provide the code.

2 Likes

The same issue I’ve also found.

Hi,

Anybody from the AIcrowd Team had a chance to take a look at the issue?

I also found out that on both task1 and 2 training sets, there are duplicates of annotated query/product pairs with different “esci_label”, including pairs that are annotated with both “exact” and “irrelevant” labels.

Although they posses different "product_id"s, when concatenating all fields from product_catalogue-v0.2.csv, they end up with the exact same description.

From task1:

From task2:

1 Like