Submissions stuck at "Compiling model for Neuron"

whoamananand · December 20, 2025, 12:15am

Issue
My submissions are getting stuck at “Compiling model for Neuron - Hold tight, this will just take a moment” for 1.5+ hours without completing.

Timeline

2 days ago: Submissions worked fine.
Today: Multiple submissions stuck at Neuron compilation step

What Changed
I noticed AIcrowd added new Neuron documentation on Dec 18 with the --neuron.model-type flag. My earlier successful submissions didn’t use any Neuron flags.

What I’ve Tried

Recent submissions with --neuron.model-type llama - stuck
Submissions with --vllm.max-model-len 2048 - also stuck
Submission WITHOUT neuron flags (testing if old approach still works) - stuck

Model Details

Model: Llama-based
Architecture: meta-llama/Llama-3.2-8B-Instruct base
Repo tag: main
Prompt template: Custom Jinja template (chess.jinja)

Questions :

Did something change in the evaluation infrastructure around Dec 18?
Should we use --neuron.model-type or avoid it?
Are other participants experiencing similar Neuron compilation hangs?

Any guidance would be appreciated!

artist · December 20, 2025, 9:46am

I have the same issue with Llama 3.1 8B. I can confirm that it compiles and evaluates on an AWS trn1.2xlarge instance but gets stuck here. Perhaps it’s due to a config mismatch?

whoamananand · December 20, 2025, 9:52am

Its Frustrating tbh, I wasted 10+hrs of GPU credits and then it refuses to evaluate it, without an apparent reason, It was working until neuron wasn’t there. And there’s no one from the team that’s helping, I even mailed one of them.

jyotish · December 21, 2025, 12:33am

@artist @whoamananand it seems like the models are hitting memory limits. Can you share the the config params you used to compile the model on trn1.2xlarge so that we can investigate this further?

whoamananand · December 21, 2025, 8:19am

Base Model: meta-llama/Llama-3.1-8B-Instruct

Model Architecture:

Parameters: ~8.03B (8,030,261,248)
Hidden size: 4096
Num attention heads: 32
Num hidden layers: 32
Vocab size: 128,256
Max position embeddings: 131,072
Torch dtype: bfloat16

Can tell me how to submit this properly?

artist · December 22, 2025, 4:36am

Would it be possible to allow uploading pre-compiled models?

sumeet_varma · December 22, 2025, 5:09am

Running into the same issue, unable to submit it.

ilya_gusev · December 22, 2025, 1:41pm

I have the same issue with models on top of Qwen 3 8B: AIcrowd | Global Chess Challenge 2025 | Submissions #305758

artist · December 25, 2025, 9:20am

@whoamananand @jyotish

I got Llama 3.1 8B working with the following configuration:

HF_REPO_TAG=main
NEURON_MODEL_TYPE=llama
VLLM_MAX_MODEL_LEN=512
VLLM_MAX_NUM_BATCHED_TOKENS=512
VLLM_MAX_NUM_SEQS=1
VLLM_DTYPE=bfloat16
VLLM_ENFORCE_EAGER=true
VLLM_INFERENCE_MAX_TOKENS=64