As required by the competition rules, I here share the external data I used for my competition entries. Other participants may want to share their data as well in this thread.
I have used the following external datasets:
inaturalist 2021 dataset:
A custom subset of iNaturalist images, including many mosquito species, but also other species was downloaded from inaturalist-open-data:
A csv file containing path, url and species name:
For downloading use e.g. a download manager like aria2. Careful, a lot of space is required (440 GB), which is why I’m sharing the links rather than reuploading the data.
The license is image specific but generally is either public domain or some form of creative commons. The bulk of the images have CC-BY-NC and CC BY-SA licenses. I’m not a lawyer, but I assume using them for non commercial machine learning models is fair use.
Justification for selection:
to solve the issue of having few samples in some minority classes and to have better discriminate features for insect classification.
@MPWARE really? that’s impressive if you got such a high score only with the provided data! Got to tell us how you achieved that after the competition!
@OverWhelmingFit@tfriedel Are we sure that your external data does not contain any subset of Mosquito Alerts (outside the data provided in train dataset)? As it’s forbidden in the rules.
@MPWARE
So iNaturalist is a separate app from mosquito alert, so any image taken with the app directly will be different from images taken with the mosquito alert app. That said, it’s also possible to upload files you have stored locally, probably in both apps. It would be difficult to rule out images that have been uploaded in such a way by users to both apps, especially if you don’t have the mosquito alert image dataset. I guess if this happens rarely it will not be a big deal.