I have a strong suspicion that the leaderboard totals are calculated incorrectly.
Please could I get confirmation.
The scores for Team 1 suggest token counts that are achievable for FLAN, but not for 3.5T, as far as I can tell.
For example, 12 tokens for #6 yields a score of 119856, for 3.5T. But I do not believe that’s possible.
But should yield 59928, for FLAN.
The scores appear to suggest it is possible for a 12 token solution to #6.
So I think it’s multiplying their score by 2 incorrectly. Or should not be updating their ‘best score’ while ignoring the model.