Regarding to maxiumn number tokens of response for llama 3

wufanyou · April 25, 2024, 2:44am

I want to raise organziers attention that Llama 3 had a larger vocabulary size (128K) comparing to llama 2 (32K). So we need to clear define in the rule that what tokenizer is used to truncate the response (previously the code used llama 2 tokenzier).

Best
Fanyou

aicrowd_team · April 25, 2024, 5:28pm

@wufanyou : The starter kit already includes the tokenizer we are using on the evaluator to limiting the maximum token size of the response.

wufanyou · April 25, 2024, 10:42pm

@aicrowd_team Yes. I understand that the code has already had this tokenzier. But Llama 3 had different vocab size (128K vs 32K). In some cases, the output number of tokens will be smaller than that of llama 2 if the output texts are the same. In terms of the model performance, LLama 3 is better (in the report) and I foresee people might use it. So I suggest if we can replace the current tokenzier for truncating predictions to Llama 3’s.

graceyx.yale · May 11, 2024, 3:33am

Hi wufanyou,

We use the same tokenizer for both llama 2 and llama 3 models so that it is a fair comparison.

Thanks,
The CRAG Team