CUDA OOM in generate predictions stage

Got CUDA OOM in generate prediction stage:
RuntimeError: CUDA out of memory. Tried to allocate 146.00 MiB (GPU 0; 15.78 GiB total capacity; 7.99 GiB already allocated; 146.75 MiB free; 14.43 GiB reserved in total by PyTorch)
but my model fits 3500G of video ram on my desktop and it obviously must fit V100 ram
Could you help with this issue, please?