Possible evaluator change: concurrency now 4 (was 1) — Qwen3/Neuron vLLM outputs corrupted (finish_reason=length, missing <uci_move>)

Hi AIcrowd team — thanks for running the Global Chess Challenge.

I’m seeing what looks like a recent regression for Qwen3 (neuron.model_type=qwen3) on the Neuron/vLLM backend: at higher evaluator concurrency the model often produces garbled output, hits max tokens, and fails to reliably emit <uci_move>...</uci_move>, which causes immediate resignations and extremely high ACPL.

What changed (evidence from evaluation-state logs)

Looking at the config_snapshot field from GET /submissions/<id>/evaluation-state:

  • Submission 305873 (Dec 23): config_snapshot.concurrency = 1

    • finish_reason=stop (100%), reasonable completion lengths, <uci_move> present reliably
    • Overall ACPL ≈ 119
  • Recent submissions (Dec 24) now show config_snapshot.concurrency = 4

    • Example 305972: config_snapshot.concurrency = 4
    • finish_reason=length ~100%, completion tokens always hit the cap, <uci_move> rate ~25%
    • Outputs often look corrupted/garbled (binary-ish text), leading to resignations and ACPL ≈ 864+
    • This submission used the same prompt settings as 305873 (vllm.max_model_len=512, dtype=bfloat16, enforce_eager=true, max_tokens=64).

I also tried explicitly requesting --num-games 1 / --concurrency 1 in aicrowd submit-model, but the resulting evaluation logs still show concurrency=4 (and num_games=4), suggesting these are being overridden by the evaluator (e.g. submission 305974).

Questions

  1. Did the evaluator concurrency change recently from 1 → 4 for this challenge?
  2. If so, is there a recommended configuration (vLLM/Neuron flags or supported submission fields) to keep Qwen3 stable at concurrency=4?
  3. Is this a known Neuron/vLLM issue/regression for Qwen3 under concurrent request load?

I’m happy to provide additional req/resp snippets (showing finish_reason=length + corrupted outputs) if that helps debugging. If this should be handled privately instead of on the forum, let me know and I can share details via DM/support.

Thanks again for the challenge and for any guidance here.

1 Like