Has the evaluation system broken down?

I noticed everyone is failing with ACPL 1000 ?

1 Like

Hello @completely_lost

Can you try submitting model with --num-games 1 flag? This might be happening due to the trainium device getting overloaded. --num-games controls the concurrency of the games (we also set --max-num-seqs vLLM parameter to the same value).