A few questions regarding my Docker image construction

  1. I noticed that in the start-kit, the Dockerfile imports the base image FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04. However, I require a higher version of CUDA and also need components like NCCL from the CUDA toolkit. Can I replace the base image source with FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04 and expect it to run smoothly?
  2. Does the official server environment support CUDA versions 12.1 and above? If not, how can we work around this issue?

@gaozhanfire : Yes, you are free to use any base image you please.

Thank you for your response. I have one more question regarding the server environment used by the official setup. I’ve encountered issues during Docker builds in an offline environment, where if the CUDA version of the Docker image is higher than that of the host machine, it might lead to build failures. Could you please provide details on the Nvidia Driver Version used on the online server and any other configurations that might have an impact?

@gaozhanfire : Here is the GPU setup we have on the evaluation nodes:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:1B.0 Off |                    0 |
| N/A   26C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       On  | 00000000:00:1C.0 Off |                    0 |
| N/A   25C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla T4                       On  | 00000000:00:1D.0 Off |                    0 |
| N/A   25C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla T4                       On  | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+