‼️ External Dataset: 🗂️ Notice on Usage & Declaration

Hello all,

This post aims to provide clarity regarding the use of external data. In order to promote responsible practices, we kindly request that you prioritize transparency when incorporating additional data from the Internet. It is important to document your methods clearly, adhere to ethical standards, and effectively communicate them.

Please take note of the following guidelines when using an external dataset for this challenge:

  1. Transparency: The Participant agrees to maintain complete transparency when incorporating additional data from external sources. This includes clear documentation of methods used and adherence to ethical standards.

  2. External Dataset Usage: When using an external public dataset, the Participant must ensure the dataset allows non-commercial or research use. Terms of usage and licensing must be complied with, and may vary depending on the data source. In context of the MosquitoAlert data source, any publicly available MosquitoAlert data other than the datasets released as a part of the resources of this competition are not considered as valid external data sources, and Participants are not allowed to use the same in preparation of their solution for any of the phases of the competition.

  3. Dataset Declaration: In instances where the Participant collects their own dataset for training purposes, it must be made available to other participants and organizers, under a license that permits use in this competition. A link to the dataset should be shared before the end of the challenge by 12th October, 2023.

  4. Use Justification: When declaring the use of a dataset, the Participant must provide an explanation for its use in relation to this challenge. The Participant must explicitly state that the usage does not violate any terms of use for the dataset.

If you collect your own dataset and use it for training, we kindly request that you make it available to other participants and organizers with a license that allows its use in this competition. You can accomplish this by sharing the link on this thread. Please complete this step by October 12th, 2023 23:55 UTC.

In the comments of this post, please provide justification for your use of the dataset in relation to participating in this challenge along with a link to access the dataset. Explicitly state that it does not violate the terms of use for the datasets.

To avoid inadvertently using MosquitoAlert data when downloading photos from GBIF, filter by rights holder:
df = df[df.rightsHolder!='Mosquito Alert']

1 Like

Datasets used:

  • Inaturalist subset containing images from the six mosquito classes.
    Removed all images without a explicit license allowing non-commercial usage, removed larvae stage mosquitos based on metadata.
    Google Colab
    inaturalist-mosquitos-cropped | Kaggle
  • GBIF subset containing images from the six mosquito classes.
    Removed all the images published by Inaturalist or AlertMosquito, and without a explicit non-commercial license.
    Removed larvae stage mosquitos using CLIPs encoders, cropped using baseline detector.
    Google Colab
    gbif-residual-cropped | Kaggle
1 Like