πŸš€ Dataset Update: `v0.3` of the dataset has been released πŸš€

Dear Participants,

We are happy to announce the release of the v0.3 of the KDD Cup 2022 Amazon ESCI Challenge Dataset. We welcome you to download the latest version of the dataset from the Challenge Resources page.

This release addresses some of the issues raised by the participants in the ongoing round of the challenge.

The key changes are :

  • Each of the test sets of Task-1, Task-2 and Task-3 have been updated to remove the overlaps with the training datasets of other tasks.
  • The sample submissions for Task-1 now only include the query-product pairs present in the Test set.
  • The evaluators for all the tasks have been updated to compute the scores on the clean Test Sets.
  • All the previous submissions have been re-evaluated using the clean test sets, and the scores on the leaderboard represent the scores on the updated test sets for all the tasks.

Please note that the Training sets and the Product Catalogue remain unchanged.

We will start accepting the code submissions from 21st of June, 2022, until then you can continue submitting your predictions on the v0.2 test sets. However, the evaluators will only consider the subset of the test set which are present in the v0.3 of the test sets. We nevertheless recommend making the future submissions using the test sets from the latest v0.3 release of the dataset.

Best,
Mohanty

6 Likes

For the final submission, are we allowed to use data from other tasks for training?

Yes, you can use it for training.

please show me how to submit my code?

3 Likes

When will it be possible to submit using the new (shorter) test dataset (currently not possible for e.g. Task 3)?

Hi @trenusch,

The new test dataset files are already available on the resources page along with this announcement.

The new submissions since the release have the scores calculated on the new dataset (along with older submissions re-evaluated automatically).

We will start accepting the code submissions from 21st of June, 2022, until then you can continue submitting your predictions on the v0.2 test sets. However, the evaluators will only consider the subset of the test set which are present in the v0.3 of the test sets.

Please let us know in case you are still facing some issue in accessing or submitting solution on new Task 3 dataset.

When i created a submission for the new test-dataset i got the error (Error (exit code 1): AssertionError: Invalid Submission File. Expected a CSV file with 277044 rows and 2 columns) even though i am confident the file had the expected format. However submitting a file containing 394367 rows is possible.

Hi @trenusch, please verify example_id column in your submission.
It looks like you are using indices [0....n] instead of the the example_id which are required for a valid submission, due to which your submissions are failing.

You can check sample submission for the correct example_ids as well.

thank you! was just confused about the error message.

1 Like