After several data debug, we finally made a successful submission inference online only. For those teams who need to build their own docker image, here is the minimal template:
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 # The host driver version is 450.172.01, only support CUDA:11.03 or earlier. Check Docker Hub for more other base image RUN apt-get update && apt-get upgrade -y && apt-get install -y python3 python3-pip && apt-get clean && rm -rf /var/lib/apt/lists/* RUN pip install <some package> RUN ln -s /usr/bin/python3 /usr/bin/python COPY models/* /models/ COPY utils /usr/local/lib/python3.6/dist-packages/starter_kit/ # put all necessary code into utils, Given the different operation system, the python version might differ
# The online submission environment requires running as a non-root user named aicrowd. If run as root, the system can pass public phase but will fail on private phase. I guess the system try to delete the file generated at public phase but fail due to no permission. ENV USER aicrowd ENV HOME /home/aicrowd RUN groupadd --gid 1001 aicrowd RUN useradd --comment “Default user” --create-home --gid 1001 --no-log-init --shell /bin/bash --uid 1001 aicrowd USER aicrowd
Hello, I want to know if the configuration you showed should be written in Dockerfile? I really don’t know how to start building my environment,I’m looking forward to your reply!
As I said in the note, this configuration only works for the public test phase but will fail during the private test phase. I am also waiting for a reply from the aicrowd team.
Really thank you for your exploring and sharing. And I have some comment which may be helpful for someone.
Exactly, I find that 450 driver can support all 11.x cuda. And if you only use pytorch, you don’t need to install cuda by yourself since pytorch has packed a cuda (that is why pytorch has a so large whl file).
Yes, it said that the 450 drivers support all Cuda 11. X like 11.4. But I test it first at 11.4 11.3 11.2 locally with 450 drivers. Only 11.0.03 works. So that what I suggest 11.03 is the highest one.
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
RUN apt-get update && apt-get upgrade -y && apt-get install -y python3 python3-pip && apt-get clean && rm -rf /var/lib/apt/lists/*
RUN pip install pandas
RUN ln -s /usr/bin/python3 /usr/bin/python
COPY models/* /models/
COPY utils /usr/local/lib/python3.6/dist-packages/starter_kit/
The online submission environment requires running as a non-root user named aicrowd. If run as root, the system can pass public phase but will fail on private phase. I guess the system try to delete the file generated at public phase but fail due to no permission.
ENV USER aicrowd
ENV HOME /home/aicrowd
RUN groupadd --gid 1001 aicrowd
RUN useradd --comment “Default user” --create-home --gid 1001 --no-log-init --shell /bin/bash --uid 1001 aicrowd
USER aicrowd
And my repository structure is same as you. but I get the result that the build fails. Can you help me see what is the reason?
I tested your dockerfile locally. It said that pip is not found. This is an operating system special problem. For ubuntu 18.04, you need to use pip3 by default. when using pip3 install some packages. This problem is solved. I suggest that before submitting it, you should test it locally.
Hi,
When you test locally,
Does this command work? COPY utils /usr/local/lib/python3.6/dist-packages/starter_kit/
it says: COPY failed: file not found in build context or excluded by .dockerignore: stat usr/local/lib/python3.7/dist-packages/starter_kit/: file does not exist