Update in segmentation scoring metrics

Hi everyone,

We have made a change in the way metrics are calculated, to better match the standard practice used in segmentation literature. The IoU and Dice scores are now calculated per class on the entire dataset and then averaged, instead of per image. As a result, most submissions are expected to get slightly higher scores.

Updated code for local evaluation of your models can be found in the starter-kit. All valid old submissions will be re-evaluated within the next few days. All future submissions will be evaluated with these updated metrics.