We are trying to use flash-attn to accelerate the predict. It works out on our system that shorten the runtime from 5 mins to near 1:30 mins. However, we found it hard to only use requirements.txt to download it. Because we usually download it use pip install flash-attn --no-build-isolation and requirements can not accomplish that. We encounter the system error of no module packaging even though we already list packaging in requirements.
I think you can try to add some sort of pip install flash-attn --no-build-isolation
in the dockerfile.
Thank you