Phase 2 launching!

Dear Participants,

We are delighted to inform you that Phase 2 is launching! Before the actual launch, please let me introduce what’s different in Phase 2.

  1. Batch inference interface. Phase 2 will adopt an improved inference interface which supports batch inference. Moreover, with the support of VLLM (we will provide a baseline implementation), inference was significantly accelerated, allowing for a more generous submission quota.
  2. Eligibility of Phase 2. Due to the acceleration brought by batch inference and VLLM, we are glad to announce that all teams who beat the baseline in Phase 2 will proceed to Phase 2, regardless of your ranking.
  3. More GPUs available. In Phase 2, you will have access to 4 NVIDIA T4 GPUs instead of 2 in Phase 1. You will also have a maximum repo size of 200GB.
  4. More generous submission quota. We almost double our submission quota in Phase 2 compared to Phase 2. Specifically, you can now at most make 5 submissions to each track per week.
  5. More informed decision of time limits. In Phase 1, the time limit is set to a value that is insufficient for many 7B models (e.g. Mistral). To address that, for the launch of Phase 2, we benchmark two popular models, LLaMA3-8B-Instruct and QWen7B, and accordingly set the time limit. With the support of VLLM, the inference time of the two models, as well as the time limit, are as follows.

    We will impose a 10-second limit for a single sample. Although it is shorter than 15s in Phase 1, given that the average per sample time is only ~0.2s, it is highly unlikely to use >10s for a sample.

Phase 2 is currently live and accepting submissions. We have a new starter-kit available here. Phase 2 will end at 23:55 UTC, July 10, 2024. Looking forward to your ingenious solutions!

Best,
Amazon KDD Cup 2024 Organizers

I notice that AIcrowd website says “Round 2: 21 days left” which implies that phase 2 ends on June 15th. It this the correct end of phase 2?

No, it was a mistake. I will modify that very soon.

I noticed that the competition guidelines mention that Phase 2 will feature “more and harder tasks.” However, I recently checked the leaderboard and saw that one of the teams has submitted their results for Phase 2, and their score is exactly the same as their Phase 1 score.

Could you please clarify if the data has not been changed between phases, or if there might be an error in the task description? Is it normal to see identical scores across both phases?

There was a bug in the evaluator setting (probably the evaluators are not replaced yet). We have just fixed that and re-queued their submissions.

Thank you for answering my previous question.

By the way, does the inference time per sample need to be determined based on the maximum length of the answer? For example, the official test shows an average inference time per sample of less than 0.2 seconds. However, if an answer contains more than 50 tokens, it is very likely to cause a timeout and result in a submission failure. This seems to be an issue that cannot be optimized.

It is highly unlikely that any answer will take more than 10 seconds. The decoding step of LLM inference takes O(l) time where l is the generation length. If 50 tokens take 0.5 seconds, you would need to generate 1k tokens to reach the 10 second limit, which is wildly unnecessary in all questions.

1 Like