📥 Guidelines For Using External Dataset

Hello all,

We provide a development test set with images and ground truth labels to simplify debugging and enable local validation for this challenge.

We do not provide a training dataset for this competition. Participants can use publicly available data under a common-use license or other public research datasets such as Products10K.

Please note the following points while using an external dataset for this challenge:

  • Please use images with creative commons licence. If you use images from browser search or google images, please ensure they contain a creative commons licence.

  • While using an existing public dataset, ensure that it explicitly states that it can be used for non-commercial or research purposes. For example: Products10K

  • Depending on the data source, other terms might be used. Please ensure the term of usage and licensing abides by the above-stated points.

Declaring Dataset

If you collect your dataset and use it for training, we request that you make it available to other participants and organisers with a license which allows it to be used in that competition. You can do so by sharing the link on this thread. This needs to be done by April 9th, 2023.

For any further queries regarding the dataset, please comment on this post.

All the best!

What training dataset do people using transfer learning need to publish? Do we attach a link to the training set because of the size.

Yes, attaching a public link would be sufficient. If the pre-trained models are available in popular libraries like timm or torchvision, a link to them will also suffice.

Hi @snehananavati

Is this mean we need to publish the pre-trained models we used? Even though they are publically available?

I thought we need to publish the datasets or models if they are not publically available. For example, if someone collects data via web scrapping or someone uses a pre-trained model that is not publically available.

How can i post this link of dataset. I can’t see any link of form you attach on this post ?

All dataset link i used
  1. https://www.pinlandata.com/rp2k_dataset/
  2. DeepFashion Database
  3. Shopee - Price Match Guarantee | Kaggle
  4. Alibaba goods dataset | Kaggle

@long_nguyen_hoang are we allowed to use shopee? its licensed as competition only on kaggle

I’m not sure. But i saw the rule of this dataset is “Competition Use Only”, and we are in competition too. So i think we are allowed to use it :wink:

Datasets that we used for transfer lerning:

Shopee
MET Artwork Dataset
Alibaba goods
H&M Personalized Fashion
GPR1200
Deep Fashion - Consumer-to-shop Clothes Retrieval Benchmark part
DyML Product
Stanfords Online Products
Our custom dataset from web scraping

Models only from GitHub - huggingface/pytorch-image-models: PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

I used following datasets:

  1. https://products-10k.github.io/
  2. SOP

And will use:
Amazon
AliProducts
in the future

Link to DyML Product doesn’t work for me.
Also web scraping is a very sensitive subject since you need to ensure that every image comes with Creative Commons license.
As mentioned above Shopee dataset was allowed only for that Kaggle competition.

Hmm, indeed the link is not working, but two months ago it worked. Perhaps the organizers will tell you what to do in such a situation.

I’m using the following dataset:

Pretrained models:

At the bottom of the main page of CVPR 2021 AliProducts Challenge: Large-scale Product Recognition_算法大赛_天池大赛-阿里云天池 it says “If you find the dataset is helpful in your research, please consider citing our paper:” which implies it can be used for research purposes. One can download the dataset via links in AI_Product-Competition/get_dataset_AiProducts.sh at 4464263490143d0376a344520b0717ee91b0ff7b · pengxiaoxiao/AI_Product-Competition · GitHub.
@snehananavati Is it ok to use?

I used the following datasets:

And pre-trained models:

I used the following datasets:
Products10k
Deep fashion
Amazon dataset
Alibaba goods
Models:
timm

I used products10k and shopee dataset.

Also pretrained clip models of higgingface.

Hi, we used the Vgg16 pre-trained model on the ImageNet dataset. The subset of the dataset is available at ImageNet Object Localization Challenge | Kaggle

I’m using Laion Clip Model with Product10k dataset