πŸš€ Datasets Released & Submissions Open πŸš€

Hi @jacques_peeters, it’s weird, and thanks to let us know. :raised_hands:

Let us check and upload compressed versions as well asap. Meanwhile, you can use β€œSave link as” in case your browser isn’t saving the file by default.

I should have been smarter and try the β€œSave Link As…”, thank you :slight_smile:

Maybe it is a chrome extension on my side :thinking:

Hi guys,

Anyone know how to download the dataset by wget way ? not through the download button. e.g.
wget AIcrowd
?

Just click the download button, and copy the link from the browser download content page, it should be a aws link.

it is too slowly to download

what is the maximum number of submissions allowed for one task?

Hi @running, @good-good-study, others,

You can also use AIcrowd CLI to download the datasets, if you prefer terminal approach. :computer:

For listing all the files for this challenge:

❯ aicrowd dataset list --challenge esci-challenge-for-improving-product-search

                                     Datasets for challenge #1031
β”Œβ”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ #  β”‚ Title                                                                β”‚ Description β”‚      Size β”‚
β”œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 0  β”‚ Task 1: Query Product Ranking/product_catalogue-v0.1.csv             β”‚ -           β”‚   1.06 GB β”‚
β”‚ 1  β”‚ Task 1: Query Product Ranking/sample_submission-v0.1.csv             β”‚ -           β”‚   1.23 MB β”‚
β”‚ 2  β”‚ Task 1: Query Product Ranking/test_public-v0.1.csv                   β”‚ -           β”‚ 247.81 KB β”‚
β”‚ 3  β”‚ Task 1: Query Product Ranking/train-v0.1.csv                         β”‚ -           β”‚  42.20 MB β”‚
β”‚ 4  β”‚ Task 2: Multiclass Product Classification/product_catalogue-v0.1.csv β”‚ -           β”‚   2.13 GB β”‚
β”‚ 5  β”‚ Task 2: Multiclass Product Classification/sample_submission-v0.1.csv β”‚ -           β”‚   7.00 MB β”‚
β”‚ 6  β”‚ Task 2: Multiclass Product Classification/test_public-v0.1.csv       β”‚ -           β”‚  17.86 MB β”‚
β”‚ 7  β”‚ Task 2: Multiclass Product Classification/train-v0.1.csv             β”‚ -           β”‚  96.16 MB β”‚
β”‚ 8  β”‚ Task 3: Product Substitute Identification/product_catalogue-v0.1.csv β”‚ -           β”‚   2.13 GB β”‚
β”‚ 9  β”‚ Task 3: Product Substitute Identification/sample_submission-v0.1.csv β”‚ -           β”‚   8.09 MB β”‚
β”‚ 10 β”‚ Task 3: Product Substitute Identification/test_public-v0.1.csv       β”‚ -           β”‚  17.86 MB β”‚
β”‚ 11 β”‚ Task 3: Product Substitute Identification/train-v0.1.csv             β”‚ -           β”‚ 106.44 MB β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For downloading all the files:

❯ aicrowd dataset download --challenge esci-challenge-for-improving-product-search

For downloading selected files:

# Using wildcard or file names, below will download all the Task 1's files
❯ aicrowd dataset download --challenge esci-challenge-for-improving-product-search "Task 1*"

# Using ID given in the table during listing
❯ aicrowd dataset download --challenge esci-challenge-for-improving-product-search 1

And obviously to install AIcrowd CLI :wink:, you can do:

❯ pip install -U aicrowd-cli

Welcome to KDD Cup and hoping for your submissions soon! :rocket:

3 Likes

Hi @shuliang, the files are hosted on S3 so it is unlikely that there is issue on the server side.
Can you try changing your internet ISP in case it is throttling the download speed? :cry:

We will release compressed versions too soon for making it more accessible.
Please let all your feedbacks come in! :hugs:

Hi @Roundrobin, you can submit 5 submissions/task/day/team.

HI, @shivam for task-1, i see queryids are overlapping, is it expected? see attached screenshot for your reference!
Text of query_id=0 is overlapping in query_id=1 and so on
Screenshot 2022-03-29 at 4.29.08 PM

1 Like

Hi @shreyansdhankhar, we are looking into it at the moment, stay tuned for the updates! :innocent:

Cool, will wait for the update then !

@shivam: for each query do we need to recommend the top-10 product ids in Task-1?

Also, on the test set for task 1 there are many duplicates

2 Likes

This should be addressed in the v2.0 release of the dataset.

1 Like

Hi @mohanty @shivam ,
I have some questions about the timeline and rules.

  1. Entry Deadline: July 15, 2022 at 00:00:00 UTC. I’d like to know the exact meaning. Which is correct? Submit by July 14 at 23:59:59 or by July 15 at 23:59:59?
  2. Can we use the other task’s dataset? E.g., make a model for task 1 by training with task 1-3 datasets.
  3. Can we use some external data?
  4. Can we use some public pre-trained model?
  5. When admin or AIcrowd platform runs inference against the test dataset by using my submission (code and model), is there a limitation about computation time?
5 Likes

Hi! I want to make a code submission but it says ’ I am not authorized to access this page’ … you say i need to accept the challenge rules by clicking on the Participate button but i cant find it. Could you help me find it?

Hi @olivertautz,

You can find the participate button on the challenge page.
https://www.aicrowd.com/challenges/esci-challenge-for-improving-product-search

Thank you! I clicked it but it still doesnt work :frowning:

Can maybe only team leads do code submissions or something?

Hi @olivertautz, that’s not the case, and any team member can submit in the challenge (not only the team member).

Can you please help by sharing relevant URL (Issue page, submission ID, etc) and/or a screenshot?