In the Overview, it’s mentioned that
Each example will have a time-out limit of 10 seconds.
Does this mean that after calling generate_answer() for each query input, the final answer must be returned within 10 seconds?
In the Rules, it’s mentioned that
A time-out is applied after the first token was generated.
However, the baseline isn’t stream-based, so what does “first token” mean in this context?
Additionally, will the 10-second limit be recalculated and relaxed? I tested the full-precision llama7b baseline for task 3 on an RTX 3090. The embedding + retrieval takes 1-2 seconds, while the entire generate_answer() function takes around 7 seconds.
After testing more data, some have already exceeded the 10s. 
We apologize for the confusion. The time limit is of 10 second to predict a single sample. This is counted from the time we call generate_answer() and until we receive a response. Hopefully that clarifies your question.
Thanks for your reply! But I have two more question.
- In the Rules, it’s mentioned that
A time-out is applied after the first token was generated.
However, the baseline isn’t stream-based, so what does “first token” mean in this context?
- When I submit the code for evaluation, if a certain data exceeds the time limit, will the entire submission fail?
Looking forward to your reply!
1 Like