Can we set the vLLM sampling parameters (temp, top_p, top_k etc.) for a submission?
They’re using vllm, and I think if you add the generation config file in your model file, it should use that configuration for running your model
1 Like