|
Submitting models to Neuron: pick the right `--neuron.model-type` (and tune vLLM if you need to)
When you run aicrowd submit-model, the platform spins up a vLLM server for your model. You can pass a handful of --vllm.* flags to control things like max context length, dtype, batching limits, LoRA settings, and a few …
|