I noticed everyone is failing with ACPL 1000 ?
1 Like
Hello @completely_lost
Can you try submitting model with --num-games 1 flag? This might be happening due to the trainium device getting overloaded. --num-games controls the concurrency of the games (we also set --max-num-seqs vLLM parameter to the same value).