Since scoring varies greatly by model, would it not be better to making rankings by model? If I’m completing the challenge with Flan, I’d like to see my progress in comparison to others completing the challenge with Flan.
We added a flan only leaderboard!
We need a chatGPT only leaderboard too.
I don’t believe this is necessary, since there is no special prize for it.