I attempted to execute the identical code on two separate occasions, altering only the baseline model to Mistral-7B, without making any other adjustments to the code. Both times, the process timed out without producing any errors. However, according to the overview documentation, Mistral-7B should operate smoothly on this machine."
We use Mistral-7B and meet the same mistake.We have tried quantifying the model to int4, which generally reduces memory consumption and speeds up Inferencing, but it doesn’t seem to work.
We just made a small test, and found out that Mistral inference is slower than Vicuna, which may indeed lead to timeout. The “smoothly” is considered in terms of GPU memory but not the overall time limit.
@dailin_li @chicken_li you can contact me at weixin (Aalanyj) and email (jyl.jal123@gmail.com) for a faster reply, if you need.
mistral always failed in my last week submissions, lol
What is the point of the hosts by constraining too much submission time and GPU?
Do you want only speed and memory efficient solutions using small size LLMs for T4 GPUs?
@giba That is just the point. Scaling always work, but if you cannot put that under a reasonable resource, that would not be so helpful.