I want to build an Ollama docker. And use my model to request Ollama API to get related answer? Is it allowed?
@lu_kun : Yes you are allowed to do that. We nevertheless include a vLLM
example with batched offline inference, in the new baseline, which might be simpler to build on top of.