Submissions stuck at "Compiling model for Neuron"

Issue
My submissions are getting stuck at “Compiling model for Neuron - Hold tight, this will just take a moment” for 1.5+ hours without completing.

Timeline

  • 2 days ago: Submissions worked fine.
  • Today: Multiple submissions stuck at Neuron compilation step

What Changed
I noticed AIcrowd added new Neuron documentation on Dec 18 with the --neuron.model-type flag. My earlier successful submissions didn’t use any Neuron flags.

What I’ve Tried

  1. Recent submissions with --neuron.model-type llama - stuck
  2. Submissions with --vllm.max-model-len 2048 - also stuck
  3. Submission WITHOUT neuron flags (testing if old approach still works) - stuck

Model Details

  • Model: Llama-based
  • Architecture: meta-llama/Llama-3.2-8B-Instruct base
  • Repo tag: main
  • Prompt template: Custom Jinja template (chess.jinja)

Questions :

  1. Did something change in the evaluation infrastructure around Dec 18?
  2. Should we use --neuron.model-type or avoid it?
  3. Are other participants experiencing similar Neuron compilation hangs?

Any guidance would be appreciated!

I have the same issue with Llama 3.1 8B. I can confirm that it compiles and evaluates on an AWS trn1.2xlarge instance but gets stuck here. Perhaps it’s due to a config mismatch?

Its Frustrating tbh, I wasted 10+hrs of GPU credits and then it refuses to evaluate it, without an apparent reason, It was working until neuron wasn’t there. And there’s no one from the team that’s helping, I even mailed one of them.

1 Like

@artist @whoamananand it seems like the models are hitting memory limits. Can you share the the config params you used to compile the model on trn1.2xlarge so that we can investigate this further?

Base Model: meta-llama/Llama-3.1-8B-Instruct

Model Architecture:

  • Parameters: ~8.03B (8,030,261,248)
  • Hidden size: 4096
  • Num attention heads: 32
  • Num hidden layers: 32
  • Vocab size: 128,256
  • Max position embeddings: 131,072
  • Torch dtype: bfloat16

Can tell me how to submit this properly?

Would it be possible to allow uploading pre-compiled models?

Running into the same issue, unable to submit it.

I have the same issue with models on top of Qwen 3 8B: AIcrowd | Global Chess Challenge 2025 | Submissions #305758

@whoamananand @jyotish

I got Llama 3.1 8B working with the following configuration:

HF_REPO_TAG=main
NEURON_MODEL_TYPE=llama
VLLM_MAX_MODEL_LEN=512
VLLM_MAX_NUM_BATCHED_TOKENS=512
VLLM_MAX_NUM_SEQS=1
VLLM_DTYPE=bfloat16
VLLM_ENFORCE_EAGER=true
VLLM_INFERENCE_MAX_TOKENS=64