What training dataset do people using transfer learning need to publish? Do we attach a link to the training set because of the size.
Yes, attaching a public link would be sufficient. If the pre-trained models are available in popular libraries like timm or torchvision, a link to them will also suffice.
Is this mean we need to publish the pre-trained models we used? Even though they are publically available?
I thought we need to publish the datasets or models if they are not publically available. For example, if someone collects data via web scrapping or someone uses a pre-trained model that is not publically available.
How can i post this link of dataset. I can’t see any link of form you attach on this post ?
I’m not sure. But i saw the rule of this dataset is “Competition Use Only”, and we are in competition too. So i think we are allowed to use it
Datasets that we used for transfer lerning:
Shopee
MET Artwork Dataset
Alibaba goods
H&M Personalized Fashion
GPR1200
Deep Fashion - Consumer-to-shop Clothes Retrieval Benchmark part
DyML Product
Stanfords Online Products
Our custom dataset from web scraping
Link to DyML Product doesn’t work for me.
Also web scraping is a very sensitive subject since you need to ensure that every image comes with Creative Commons license.
As mentioned above Shopee dataset was allowed only for that Kaggle competition.
Hmm, indeed the link is not working, but two months ago it worked. Perhaps the organizers will tell you what to do in such a situation.
At the bottom of the main page of CVPR 2021 AliProducts Challenge: Large-scale Product Recognition_算法大赛_天池大赛-阿里云天池 it says “If you find the dataset is helpful in your research, please consider citing our paper:” which implies it can be used for research purposes. One can download the dataset via links in AI_Product-Competition/get_dataset_AiProducts.sh at 4464263490143d0376a344520b0717ee91b0ff7b · pengxiaoxiao/AI_Product-Competition · GitHub.
@snehananavati Is it ok to use?
I used the following datasets:
- Shopee - Price Match Guarantee | Kaggle
- H&M Personalized Fashion Recommendations | Kaggle
- https://products-10k.github.io/
And pre-trained models:
I used products10k and shopee dataset.
Also pretrained clip models of higgingface.
Hi, we used the Vgg16 pre-trained model on the ImageNet dataset. The subset of the dataset is available at ImageNet Object Localization Challenge | Kaggle
I’m using Laion Clip Model with Product10k dataset
we have tried:
rp2k
JD_Products_10K
Shopee
Aliproducts
DeepFashion_CTS
DeepFashion2
Fashion_200K
Stanford_Products
right now we are using only Products_10K, and models from OpenClip.
Hi @dipam and @snehananavati
I have a question about external datasets. I understand that for training models, images need to have a creative commons license, such as the Kaggle Competition Datasets (e.g., Shopee). However, these datasets are easily accessible to others and often used by people in their personal projects or papers, even if they didn’t participate in the competition. I’m curious about how strict the organizer will be regarding the use of external data. Private data or datasets with broken download links are not acceptable because they are not accessible to the public. However, for datasets that are easily accessible, such as those found in Kaggle Competition, I believe they should be acceptable for use.