Did somebody try deploy three llama3-8b on four T4? I wonder if it’s possible to use three llama3-8b models
How large is each llama3-8b model? If it is 16GB each, it’s possible.
I tested it on my machine, and each model is about 20GB, so I wonder if it can be deployed online.
There are 4 T4 cards. But I see that the total GPU memory is < 60GB.