@nha_nguyen_van : Yes, you can use the hf version of the llama family models.
The recently released baselines also use the same.
Can we use any embedding model as sentence transformers in baseline, as long as it’s open source.
Yes, thanks. But I have a concern about scores on the leaderboard. Some team has a missing value of approximately 0.8, but n_miss=0… It makes CRAG score low. Can you check again?
@nha_nguyen_van : Thanks for pointing it out. The missing and hallucination columns were swapped by mistake. Its fixed now.
I run local evaluation and I set a lot of answers is “I don’t now”. According to code evaluation, you will check if prediction==“i don’t know” or prediction==“i don’t known.” n_miss+=1 . But I still recieve n_miss value = 0 when submitting.At the local, i still have n_miss > 0. And submission log,there are some answers that are not evaled successfully. I only have 247/260(95%) success in Evaluation Progress. Can you check this issue?