Evaluation Criteria

I was looking at the evaluation criteria (IoU). Could you please @mohanty clarify if the background no-food class (i.e. no segmentation) should be counted towards this metric or only the food classes. In other words, should we exclude from the metrics the pixels with no food mask?

@gloria_macia_munoz as far as the implementation the background class is also considered for calculation of IoU.

https://www.jeremyjordan.me/evaluating-image-segmentation-models/, https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2

Thanks a lot @nikhil_rayaprolu. I will include them all the pixels when calculating IoU.

Is there an example from submission file? i.e. specify output format of the predictions.
Having an example for just a couple images helps a lot to see things likeā€¦
should we provide pixel segmentation mask for original/resized test images? which code should take each food class to make it consistent with your evaluate (e.g. id category?)?

@gloria_macia_munoz the output format (annotations) and details regarding that are available at https://github.com/AIcrowd/food-recognition-challenge-starter-kit/blob/master/Dataset%20Utils.ipynb

Thanks @nikhil_rayaprolu but still not quite clear. A single annotation (e.g. from the training set) has the following structure:

{
ā€œidā€:409190,
ā€œimage_idā€:48034,
ā€œsegmentationā€:[
[
94,
78,
32,
78,
32,
25,
94,
25,
94,
78
]
],
ā€œareaā€:3286.0,
ā€œbboxā€:[
32,
32,
62,
62
],
ā€œcategory_idā€:100,
ā€œiscrowdā€:0
}

For the test set, I only have a folder with images (.jpg) to segment, so not all of the above fields will be available e.g iscrowd. How does the output json look like? Participants need to know how to save the segmentation mask. Thanks a lot in advance!

1 Like

Hi @gloria_macia_munoz,

Yes, the structure shared by you is correct. You can ignore iscrowd field.

Example for final structure required is as follows:

[
  {
    "image_id": 28902,
    "category_id": 2738,
    "score": 0.18888643674121008,
    "segmentation": [
      [
        270,
        195,
        381,
        823,
        56,
        819,
        527,
        [....]
      ]
    ],
    "bbox": [
      56,
      165,
      678,
      658
    ]
  }
  [....]
}

Please let us know in case there is any followup question. All the best with the challenge! :smiley:

3 Likes

Thanks a lot @shivam, slowly getting closer - my only remaining doubts will be, I guess image_id is the filename without extension (.jpg) and score must be the confidence towards the corresponding category was predictedā€¦? The rest is clear. :wink:

2 Likes

Yes @gloria_macia_munoz, you are correct for image_id & score field. We will also work toward adding this information in starter kit so it is easier for newer participants.

cc: @nikhil_rayaprolu

I have a question about the score field. I understand that itā€™s the confidence of the prediction, but how is that used in the computation of the precision and recall results?

As far as I understand, the IoU is computed by taking the correct ground truth category and comparing it to the submitted prediction (the top one), and then precision and recall are computed using true positives, false positives and false negatives (where an IoU of 0.5 is the cutoff for a correct prediction). So using these calculations, it shouldnā€™t matter what the confidence score is (because only the prediction category for each pixel gets compared with the ground truth category)? Or did I perhaps miss something? Is the confidence score maybe used to weigh the results or something along those lines?

@simon_mezgec completely agree with your point. By reading the link above, I also understand the segmentation is to be provided in poly format and bbox as [X, Y, W, H] but clarification in this kind of things would be very useful @shivam .

We use the official cocoeval for evaluation, the code for it is available at: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py
There you can search for ā€˜scoreā€™ key and check how it is being used for evaluation purposes,
If you observe though scores are not used in the calculation formulae but itā€™s used for the purpose of sorting.

2 Likes

@nikhil_rayaprolu - thanks for the answer! Interesting, so the ā€˜scoreā€™ field is used only to sort the predictions by their confidence.

The way I do the classification requires the annotations to be in a pixel map format (so an image where each pixelā€™s value is a numerical value, corresponding to a food category). This means two things:

  1. For the ground truth, I convert the annotations to label images where I have only one ā€œpredictionā€ per pixel. So thereā€™s no ā€˜scoreā€™ field left there.
  2. Similarly, the output of my model is a label image, where each pixel is labelled with the food category with the top confidence. Since I disregard all the lower-confidence predictions, Iā€™m again left with only one prediction per pixel - no ā€˜scoreā€™ field here either.

So in my case, would it make sense if I would just assign a 1 to the ā€˜scoreā€™ field, since itā€™s used only to sort the predictions? Would the fact that Iā€™m not saving lower-confidence predictions in any way make me lose accuracy (precision or recall)?

Sorry for the confusion - I just want to make sure Iā€™m doing this properly. :slight_smile:

Just a quick FYI - Iā€™m currently using another approach (started off with the MMDetection baseline submission and modified it), so this is no longer needed.