πŸ—‚οΈ Using External Data: Guidelines for Data Declaration

Dear all,

This post aims to provide clarity regarding the use of external data. In order to promote responsible practices, we kindly request that you prioritize transparency when incorporating additional data from the Internet. It is important to document your methods clearly, adhere to ethical standards, and effectively communicate them.

Please take note of the following guidelines when using an external dataset for this challenge:

  1. When utilizing an existing public dataset, ensure it explicitly permits non-commercial or research use.

  2. Depending on the data source, different terms may be used. Please ensure that the terms of usage and licensing comply with the aforementioned guidelines.

Dataset Declaration:

If you collect your own dataset and use it for training, we kindly request that you make it available to other participants and organizers with a license that allows its use in this competition. You can accomplish this by sharing the link on this thread. Please complete this step by June 10th, 2023 23:55 UTC.

In the dataset declaration post, please provide justification for your use of the dataset in relation to participating in this challenge. Explicitly state that it does not violate the terms of use for the datasets.

For any additional questions regarding the dataset, please comment on this post.

Best of luck

All right, we have to make dataset available to other participants after the competition end.

Wait.

Really?

2 Likes

Hi,

According to the challenge rule we signed before participation,
we agree to β€œrelease any code to scrape additional data for training”. It helps reproduce results.

Please see more details in challenge rules.

I almost ignored the discussion form. Please send emails to the competitors.

1 Like

This a the external dataset used by our team.

@hansu How did you gather that data? We looked at amazon sites, and terms of use do not allow data scrapping,

@snehananavati Do pretrained models weights (for instance form Huggingface site) count as datasets?

1 Like

I have escalated this query to the organiser and will provide an update as soon as I receive a response

If the pre-trained model is open-sourced under a license such as Apache 2.0, it is acceptable. The requirement is that, for any method used, Amazon Search should have the opportunity to learn from it and potentially utilise it for production. If the models are open-sourced under a suitable license, they meet the criteria.