Is my model capable of processing one prompt at a time and providing a corresponding response, or can it handle a batch of prompts simultaneously and produce a list of responses?
At present, we support the first type of predict()
. We will support the second type (batch inference) later (before Phase 2).