Some issues about time limit

In the Overview, it’s mentioned that

Each example will have a time-out limit of 10 seconds.

Does this mean that after calling generate_answer() for each query input, the final answer must be returned within 10 seconds?

In the Rules, it’s mentioned that

A time-out is applied after the first token was generated.

However, the baseline isn’t stream-based, so what does “first token” mean in this context?

Additionally, will the 10-second limit be recalculated and relaxed? I tested the full-precision llama7b baseline for task 3 on an RTX 3090. The embedding + retrieval takes 1-2 seconds, while the entire generate_answer() function takes around 7 seconds.

After testing more data, some have already exceeded the 10s. :sweat_smile:

We apologize for the confusion. The time limit is of 10 second to predict a single sample. This is counted from the time we call generate_answer() and until we receive a response. Hopefully that clarifies your question.

Thanks for your reply! But I have two more question.

  1. In the Rules, it’s mentioned that

A time-out is applied after the first token was generated.

However, the baseline isn’t stream-based, so what does “first token” mean in this context?

  1. When I submit the code for evaluation, if a certain data exceeds the time limit, will the entire submission fail?

Looking forward to your reply!

1 Like