Is the scoring function F1 or logloss?

yzhounvs · November 15, 2019, 3:30am

Looks like the leaderboard ranking is based on F1 instead of logloss as communicated.

shivam · November 15, 2019, 2:48pm

Hi @yzhounvs,

The leaderboard is based on F1 as primary and logloss as secondary score.

Can you point us which communication you are referring to above, so we can fix/discuss there?

yzhounvs · November 16, 2019, 3:40am

In Evaluation Criterion.

shivam · November 17, 2019, 10:59pm

@yzhounvs,

We will get the challenge page updated after communicating with organisers, and update here when it’s done. Till then please consider “F1 as primary and logloss as secondary score”.

shivam · November 18, 2019, 9:19pm

@yzhounvs The miscommunication has been sorted out and you were correct. The log loss is the primary score and f1 score is secondary. The leaderboard has been fixed and new ranking are listed accordingly.

bjoern.holzhauer · December 2, 2019, 3:11pm

When we sat down together as a team, we realized that we are not sure, at all, whether it will be the logLoss of the final submission or the best logLoss of any submission. Obviously, that makes a difference for how one does submissions. Could you clarify?

kelleni2 · December 3, 2019, 3:53pm

hi bjoern - right now it is the best log loss submission.

please keep in mind that in the test data - we do have a hold out.

the final leaderboard will be the hold out test data - plus rthe current test data. this would be evaluated currently on your top submitted model.

yzhong118 · December 4, 2019, 12:08am

Hi, @kelleni2, how is the “top submitted model” determined?

Does it mean that the final leaderboard only evaluates the best performing submission based on the current public leaderboard? Or all submissions will be used on the whole test data to identify the top score for final evaluation?

shivam · December 4, 2019, 7:00am

It is your submission having best score on half of the test dataset.

We already have scores against full dataset for all of your submissions (hidden), so all submissions will be used.

carlos.cortes · December 4, 2019, 7:46am

Hi,

Can you confirm all the submission be considered for the final leaderboard? Or do we send to send something like a final submission?

Thanks!
Carlos

shivam · December 4, 2019, 7:50am

Hi, I will let @kelleni2 confirm on this from organisers point of view, given it is just configurable setting on our side.

shivam · December 18, 2019, 8:28am

cc: @kelleni2, @satyakantipudi

Please confirm policy for final scoring i.e. all submissions will be considered or the one having best score on partial dataset?

carlos.cortes · December 18, 2019, 8:34am

From my point of view it would be painful to take the one with the best partial score as we have one with almost the same partial score that we think could generalize better.

bjoern.holzhauer · December 18, 2019, 9:16am

Completely agree.

Scoring the submission with the best partial score would be absurd, because teams have no control over designating what they think should be scored and can be penalized for an early attempted that happened to be good on the public leaderboard.
Taking the best one out of anything ever submitted of course just encourages an absurd shotgun approach.
Taking the last one submitted or the best one out of the last 5 or 10 submitted might be reasonable.

It would be really good to know what will be done and to know that it is some sensible approach.