Submission instructions

herve.goeau · March 19, 2020, 2:30pm

Hi, we have clarified the following points in the submission instructions:

format of the submitted run files + a short fake sample as an example
primary and secondary metric
the rules about the use of external training data
the maximum number of run files

Please have a look on the main page of the PlantCLEF2020 challenge, or just read below:
"

Submission instructions

As soon as the submission is open, you will find a “Create Submission” button on this page (next to the tabs).

Before being allowed to submit your results, you have to first press the red participate button, which leads you to a page where you have to accept the challenges rules.

More practically, the run file to be submitted is a csv file type (with semicolon separators) and has to contain as much lines as the number of predictions, each prediction being composed of an ObservationId (the identifier of a specimen that can be itself composed of several images), a ClassId, a Probability and a Rank (used in case of equal probabilities). Each line should have the following format: <ObservationId;ClassId;Probability;Rank>

Here is a short fake run example respecting this format for only 3 observations: fake_run

As soon as the submission is open, you will find a “Create Submission” button on this page (just next to the tabs).Evaluation criteria

The primary metrics used for the evaluation of the task will be the Mean Reciprocal Rank. The MRR is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer. The MRR is the average of the reciprocal ranks for the whole test set:

https://wikimedia.org/api/rest_v1/media/math/render/svg/d16e3616105fd3cbad78fa61e2f60c6abb458e26

where |Q| is the total number of query occurrences in the test set.

A second metric will be again the MRR but computed on a subset of observations related to the less populated species in terms of photographies “in the field” based on the most comprehensive estimates possible from different data sources (IdigBio, GBIF, Encyclopedia of Life, Bing and Google Image search engines, previous datasets related to PlantCLEF and ExpertCLEF challenges).

As a general comment, we can assume that classical ConvNet-based approaches using complementary training sets containing photos in the field such as ExpertCLEF2019, in addition to the PlantCLEF2020 training set, will perform well on the primary metric. However, we can assume that cross-domain approaches will get better results on the second metric where there is a lack of in-the-field training photos.

Since the supremacy of deep learning and transfer learning techniques, it is conceptually difficult to prohibit the use of external training data, notably the training data used during last year’s ExperCLEF2019 challenge, or over pictures that can be met through the GBIF for example. However, we ask participants to provide at least one submission that uses only the training data provided this year.

Participants will be allowed to submit a maximum of 10 run files.
"