Is the scoring function F1 or logloss?

Looks like the leaderboard ranking is based on F1 instead of logloss as communicated.

Hi @yzhounvs,

The leaderboard is based on F1 as primary and logloss as secondary score.

Can you point us which communication you are referring to above, so we can fix/discuss there?

In Evaluation Criterion.


We will get the challenge page updated after communicating with organisers, and update here when it’s done. Till then please consider “F1 as primary and logloss as secondary score”.

@yzhounvs The miscommunication has been sorted out and you were correct. The log loss is the primary score and f1 score is secondary. The leaderboard has been fixed and new ranking are listed accordingly.

When we sat down together as a team, we realized that we are not sure, at all, whether it will be the logLoss of the final submission or the best logLoss of any submission. Obviously, that makes a difference for how one does submissions. Could you clarify?

hi bjoern - right now it is the best log loss submission.

please keep in mind that in the test data - we do have a hold out.

the final leaderboard will be the hold out test data - plus rthe current test data. this would be evaluated currently on your top submitted model.

Hi, @kelleni2, how is the “top submitted model” determined?

Does it mean that the final leaderboard only evaluates the best performing submission based on the current public leaderboard? Or all submissions will be used on the whole test data to identify the top score for final evaluation?

It is your submission having best score on half of the test dataset.

We already have scores against full dataset for all of your submissions (hidden), so all submissions will be used.


Can you confirm all the submission be considered for the final leaderboard? Or do we send to send something like a final submission?


Hi, I will let @kelleni2 confirm on this from organisers point of view, given it is just configurable setting on our side.

cc: @kelleni2, @satyakantipudi

Please confirm policy for final scoring i.e. all submissions will be considered or the one having best score on partial dataset?

From my point of view it would be painful to take the one with the best partial score as we have one with almost the same partial score that we think could generalize better.

Completely agree.

  • Scoring the submission with the best partial score would be absurd, because teams have no control over designating what they think should be scored and can be penalized for an early attempted that happened to be good on the public leaderboard.

  • Taking the best one out of anything ever submitted of course just encourages an absurd shotgun approach.

  • Taking the last one submitted or the best one out of the last 5 or 10 submitted might be reasonable.

It would be really good to know what will be done and to know that it is some sensible approach.