CUDA OOM in generate predictions stage

s-abramov · August 29, 2021, 12:39pm

Got CUDA OOM in generate prediction stage:
RuntimeError: CUDA out of memory. Tried to allocate 146.00 MiB (GPU 0; 15.78 GiB total capacity; 7.99 GiB already allocated; 146.75 MiB free; 14.43 GiB reserved in total by PyTorch)
but my model fits 3500G of video ram on my desktop and it obviously must fit V100 ram
Could you help with this issue, please?