CUDA in Docker image breaks and require sudo apt upgrade

Question

Question

Why sudo apt upgrade on the host OS is required to make CUDA work in Docker container? The problem does not occur without Docker, but occurs only when a Docker image is recreated.

Environment

Ubuntu 22.04 LTS
Docker version 26.0.1, build d260a54

Dockerfile

#--------------------------------------------------------------------------------
# Dockerfile to build the base image with requirements and models downloaded.
#
# CUDA 11.7 and Pytorch is 1.13.1 due to the Deepdoctection requirements.
# https://github.com/deepdoctection/deepdoctection#requirements
# Pytorch that satisfies 1.12 <= PyTorch < 2.0 is 1.13.1.
# https://pytorch.org/get-started/previous-versions/#v1130
#--------------------------------------------------------------------------------
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
Create working directory
WORKDIR /home/eml
Copy under code/python
COPY . .
Note: every run command will create a image layer increaseing the image size.
#--------------------------------------------------------------------------------
Ubuntu libs and Timezone (https://serverfault.com/q/949991).

[deepdoctection dependency]
- poppler
https://pdf2image.readthedocs.io/en/latest/installation.html#installing-poppler
- tesseract-ocr
- qpdf for encrypted pdf. See AIML-130.
#--------------------------------------------------------------------------------
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Australia/Sydney
RUN apt -y update &&  

    apt install -y tzdata 

    software-properties-common git cmake wget pkg-config tree ffmpeg libsm6 libxext6  

    tesseract-ocr libtesseract-dev tesseract-ocr-eng poppler-utils qpdf jq gpustat 

|| exit 1
#--------------------------------------------------------------------------------
Py3.10 libs
https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa
https://askubuntu.com/a/1398569
https://www.youtube.com/watch?v=Xe40amojaXE
#--------------------------------------------------------------------------------
RUN add-apt-repository --yes ppa:deadsnakes/ppa && 

  apt install -y python3.10 python3-pip build-essential libssl-dev libffi-dev python3-venv 

|| exit 1
#--------------------------------------------------------------------------------
Pytorch/CUDA
https://pytorch.org/get-started/previous-versions/#linux-and-windows-9
#--------------------------------------------------------------------------------
RUN pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 

  --extra-index-url https://download.pytorch.org/whl/cu117
#--------------------------------------------------------------------------------
Group/User
#--------------------------------------------------------------------------------
#RUN groupadd -g 2000 eml && \
useradd -rm -d /home/eml -s /bin/bash -g eml -u 2001 eml && \
chown -R eml:eml /home/eml
#--------------------------------------------------------------------------------
Non root user
Cause issues e.g.
- mounted volume access check with os/pathlib does not work.
- torch.cuda_is_available() becomes False.
Need research how to use non-root user with file permissions, GPU with non-root
docker user.
#--------------------------------------------------------------------------------
USER eml
ENV PATH="${PATH}:${HOME}/.local/bin"
#--------------------------------------------------------------------------------
Packages
#--------------------------------------------------------------------------------
RUN pip install -r ./requirements.txt && 

    python3 -m spacy download en_core_web_trf && 

    python3 -m nltk.downloader words && 

    python3 -m nltk.downloader wordnet && 

    huggingface-cli download sentence-transformers/gtr-t5-large 

|| exit 1
#--------------------------------------------------------------------------------
Run the application
https://stackoverflow.com/a/46245972/4281353
>  if you have a docker image where your script is the ENTRYPOINT, any arguments
> you pass to the docker run command will be added to the entrypoint.
> ```
> docker run --rm <yourImageName>  -a API_KEY - f FILENAME -o ORG_ID
> ```
#--------------------------------------------------------------------------------
Executable to run by this container is always Python3
ENTRYPOINT ["python3"]

Problem

When the docker image is re-created, then the Pytorch fails to detect CUDA until sudo apt upgrade -y and reboot get done.

  File "/usr/local/lib/python3.10/dist-packages/torch/storage.py", line 240, in _load_from_bytes
    return torch.load(io.BytesIO(b))
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 795, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1012, in _legacy_load
    result = unpickler.load()
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 958, in persistent_load
    wrap_storage=restore_location(obj, location),
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 215, in default_restore_location
    result = fn(storage, location)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 182, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 166, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

It seems the CUDA or NVIDIA driver changes in the apt repository cause the problem because it causes incompatibility or deviation between NVIDIA driver on the host OS and the CUDA toolkit inside the docker container, but why?

CUDA in Docker image breaks and require sudo apt upgrade

Question

Environment

Dockerfile

Create working directory

Copy under code/python

Note: every run command will create a image layer increaseing the image size.

Ubuntu libs and Timezone (https://serverfault.com/q/949991).

[deepdoctection dependency]

- poppler

https://pdf2image.readthedocs.io/en/latest/installation.html#installing-poppler

- tesseract-ocr

- qpdf for encrypted pdf. See AIML-130.

Py3.10 libs

https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa

https://askubuntu.com/a/1398569

https://www.youtube.com/watch?v=Xe40amojaXE

Pytorch/CUDA

https://pytorch.org/get-started/previous-versions/#linux-and-windows-9

Group/User

useradd -rm -d /home/eml -s /bin/bash -g eml -u 2001 eml && \

chown -R eml:eml /home/eml

Non root user

Cause issues e.g.

- mounted volume access check with os/pathlib does not work.

- torch.cuda_is_available() becomes False.

Need research how to use non-root user with file permissions, GPU with non-root

docker user.

USER eml

ENV PATH="${PATH}:${HOME}/.local/bin"

Packages

Run the application

https://stackoverflow.com/a/46245972/4281353

> if you have a docker image where your script is the ENTRYPOINT, any arguments

> you pass to the docker run command will be added to the entrypoint.

> ```

> docker run --rm <yourImageName> -a API_KEY - f FILENAME -o ORG_ID

> ```

Executable to run by this container is always Python3

Problem

0 Answers0