We do not provide a training dataset for this competition. Participants can use publicly available data under a common-use license or other public research datasets such as Products10K.
Please note the following points while using an external dataset for this challenge:
Please use images with creative commons licence. If you use images from browser search or google images, please ensure they contain a creative commons licence.
While using an existing public dataset, ensure that it explicitly states that it can be used for non-commercial or research purposes. For example: Products10K
Depending on the data source, other terms might be used. Please ensure the term of usage and licensing abides by the above-stated points.
Declaring Dataset
If you collect your dataset and use it for training, we request that you make it available to other participants and organisers with a license which allows it to be used in that competition. You can do so by sharing the link on this thread. This needs to be done by April 9th, 2023.
For any further queries regarding the dataset, please comment on this post.
Yes, attaching a public link would be sufficient. If the pre-trained models are available in popular libraries like timm or torchvision, a link to them will also suffice.
Is this mean we need to publish the pre-trained models we used? Even though they are publically available?
I thought we need to publish the datasets or models if they are not publically available. For example, if someone collects data via web scrapping or someone uses a pre-trained model that is not publically available.
Link to DyML Product doesn’t work for me.
Also web scraping is a very sensitive subject since you need to ensure that every image comes with Creative Commons license.
As mentioned above Shopee dataset was allowed only for that Kaggle competition.