Why are the accuracy and f1 score are always the same in the evaluation of submissions for task 2 and task 3?
For multiclass classification, micro-f1, micro-precision, micro-recall and accuracy are always the same, since we always recall one and only one label for each sample.