- I noticed that in the start-kit, the Dockerfile imports the base image FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04. However, I require a higher version of CUDA and also need components like NCCL from the CUDA toolkit. Can I replace the base image source with FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04 and expect it to run smoothly?
- Does the official server environment support CUDA versions 12.1 and above? If not, how can we work around this issue?
Thank you for your response. I have one more question regarding the server environment used by the official setup. I’ve encountered issues during Docker builds in an offline environment, where if the CUDA version of the Docker image is higher than that of the host machine, it might lead to build failures. Could you please provide details on the Nvidia Driver Version used on the online server and any other configurations that might have an impact?
@gaozhanfire : Here is the GPU setup we have on the evaluation nodes:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 26C P8 9W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 25C P8 9W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 25C P8 9W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 26C P8 9W / 70W | 2MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+