Clarification about the evaluation process

  1. Regarding participation eligibility, is my understanding correct?
  • Phase 1: All teams can participate
  • Phase 2: Only teams that successfully submit in Phase 1 can participate
  • Final Round: Only the top 10 teams in phase 2 based on automatic evaluation can participate
  1. How is the final submission selected? Can we change from the best leaderboard submission?

  2. Is there no length limit for the final evaluation? (The limit is 75 tokens for automatic evaluation)

    Full responses are manually checked for hallucinations.

  3. How is the generation of the first token detected?

    A 10-second timeout starts after the first token is generated.

  4. How is time per sample measured in the batch generation pipeline?

    Only answer texts generated within 30 seconds are considered.

  5. If we exceed the time limit, will we be immediately disqualified? Or just the sample will be considered as wrong (or missing)?

  6. Is a missing answer required to be an exact match to “I don’t know,” or are similar responses acceptable in manual evaluation? Which of the following statements is correct?

    Missing (e.g., “I don’t know”, “I’m sorry I can’t find …”) → Score: 0.0

    All missing answers should return a standard response: “I don’t know.”

@yilun_jin8 @mohanty
can you check these questions?

@yilun_jin8
Any updates? If there are some questions you can’t answer, please let me know so.
Thanks.

@yilun_jin8

To be honest, I don’t really understand why you replied, made changes, then deleted your response and are now staying silent about this post.
If a certain question can’t be answered, that’s totally fine. Please just let me know.

Hi.

I agree that your questions are highly pertinent, and should be answered as soon as possible.

However, I honestly cannot answer these questions and would have to raise these questions to the organizers from Meta. We have summarized your questions, and have raised them to the organizers multiple times. However, we have not received any reply from them yet.

I can understand your anxiety in the ambiguity of the rules, and I would try my best to communicate to the organizers to get answers. The answers to your questions would benefit all the participants.

Yilun.

I see, thank you for the clarification.
I appreciate you reaching out to the organizers. I will proceed on the assumption that we may not receive a response.

Thanks!