Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN 9 support #1780

Open
AndrewMead10 opened this issue Sep 11, 2024 · 10 comments
Open

CUDNN 9 support #1780

AndrewMead10 opened this issue Sep 11, 2024 · 10 comments
Labels
dependencies Pull requests that update a dependency file python-release Python package release failed

Comments

@AndrewMead10
Copy link

AndrewMead10 commented Sep 11, 2024

currently trying to use whisperX (which uses faster whisper, which uses CTranslate2), and am getting the following error

Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!

which from what I can tell is due to having CUDNN version > 9. This is an issue because it looks like pytorch >= 2.4.0 are compiled with CUDNN > 9.

See discussion here

SYSTRAN/faster-whisper#958

Workaround right now is just to use a pytorch version < 2.4

@minhthuc2502
Copy link
Collaborator

Considering whether upgrading to PyTorch >=2.4.0 is necessary at this time to avoid impacting users on cuDNN 8. However, it might be better to follow PyTorch and upgrade to cuDNN 9.

@minhthuc2502 minhthuc2502 added dependencies Pull requests that update a dependency file python-release Python package release failed labels Sep 12, 2024
@drake7707
Copy link

Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.

I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)

@fedirz
Copy link

fedirz commented Sep 27, 2024

Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.

I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)

Hey, would you mind sharing your Dockerfile and any additional relevant commands you've used? I'm trying to switch over project faster-whisper-server over to latest CUDA with CUDNN9. Thanks!

@drake7707
Copy link

drake7707 commented Sep 27, 2024

#FROM nvidia/cuda:12.1.0-devel-ubuntu20.04 as builder
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        python3-dev \
        python3-pip \
        wget \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
    apt-key add *.PUB && \
    rm *.PUB && \
    echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
        intel-oneapi-mkl-devel-$ONEAPI_VERSION \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip --no-cache-dir install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
    tar xf *.tar.gz && \
    rm *.tar.gz && \
    cd oneDNN-* && \
    cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r oneDNN-*

ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
    tar xf *.tar.bz2 && \
    rm *.tar.bz2 && \
    cd openmpi-* && \
    ./configure && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r openmpi-*

RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12

COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}

RUN mkdir build_tmp && \
    cd build_tmp && \
    cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
          -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
          -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
          -DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
    VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN cd python && \
    python3 -m pip --no-cache-dir install -r install_requirements.txt && \
    python3 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT


ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

#COPY --from=builder $CTRANSLATE2_ROOT $CTRANSLATE2_ROOT
RUN python3 -m pip --no-cache-dir install $CTRANSLATE2_ROOT/*.whl
#&& \
#    rm $CTRANSLATE2_ROOT/*.whl

ENTRYPOINT ["/opt/ctranslate2/bin/ct2-translator"]

Build it with docker build --progress plain -f Dockerfile ..

If you have problems, I've pushed the image to the docker container registry drake7707/ctranslate2-cudnn9. You can copy /opt/ctranslate2 out of it and to your own, Don't forget to add it to the LD_LIBRARY_PATH and install the built wheel (also in /opt/ctranslate2). I didn't have to change anything else to get faster-whisper to work.

The dockerfile is mostly the same, I got a circular dependency with the multistage build for some reason and I couldn't spot the issue quickly so I did away with that. I had to install the cudnn9 dev kit with RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12 and made sure to not remove the wheel file so I could still copy it out. At the runtime side I had an issue where my libstdcxx was outdated in my runtime container anaconda environment, it's linked against against a newer version here, but a conda install -c conda-forge libstdcxx-ng=12 --yes fixed that.

@jhj0517
Copy link

jhj0517 commented Oct 4, 2024

Regarding faster-whisper, I was able to reproduce the same bug on torch => 2.4.0.

According to pytorch/pytorch#100974, torch uses its own dependent cudNN and torch >= 2.4.0 is therefore incompatible with CTranslate2.

It would be great if CTranslate2 supported cuDNN 9 so I could use it on torch >= 2.4.0.

@kittsil
Copy link

kittsil commented Oct 7, 2024

@drake7707 's great Dockerfile worked for me, except that:

  1. I am running python 3.10, whereas the base image has python 3.11. I explicitly installed python3.10 with apt-get and then used that for all python commands after the cmake.
  2. I wanted to be able to install the wheel with pip in a venv and have it "just work." To do that, I needed the shared binaries in the wheel, so I used (auditwheel) to "repair" the wheel.

After those changes (at the bottom), I was able to get the wheel out of the docker container and install it as a dependency with pip:

(.venv) $ docker build . -t drake7707/ctranslate2-cudnn9:python3.10
(.venv) $ docker run -it --rm -v ./outputdir:/opt/share/outputdir drake7707/ctranslate2-cudnn9:python3.10
root # cp /opt/ctranslate2/ctranslate2-*.whl /opt/share/outputdir/
root # exit
(.venv) $ pip install outputdir/ctranslate2-*.whl

Here is my Dockerfile, mostly what @drake7707 originally wrote:

FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    python3.10-dev \
    python3-dev \
    python3-pip \
    wget \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
    apt-key add *.PUB && \
    rm *.PUB && \
    echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
    intel-oneapi-mkl-devel-$ONEAPI_VERSION \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip --no-cache-dir install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
    tar xf *.tar.gz && \
    rm *.tar.gz && \
    cd oneDNN-* && \
    cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r oneDNN-*

ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
    tar xf *.tar.bz2 && \
    rm *.tar.bz2 && \
    cd openmpi-* && \
    ./configure && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r openmpi-*

RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12

COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}

RUN mkdir build_tmp && \
    cd build_tmp && \
    cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
    -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
    -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
    -DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
    VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN cd python && \
    python3.10 -m pip --no-cache-dir install -r install_requirements.txt && \
    python3.10 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT

ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

RUN python3.10 -m pip --no-cache-dir install auditwheel && \
    auditwheel repair --plat linux_x86_64 $CTRANSLATE2_ROOT/*.whl && \
    cp /root/wheelhouse/ctranslate2-*.whl ${CTRANSLATE2_ROOT}/

CMD ["bash"]

@Jiltseb
Copy link

Jiltseb commented Oct 8, 2024

@minhthuc2502 Could you please let us know if there is an update on this and the timeline estimate?

@minhthuc2502
Copy link
Collaborator

Hello, I will update cudnn 9 for the next release. currently you can build ctranslate2 with cudnn 9 without any problem.

@BBC-Esq
Copy link

BBC-Esq commented Oct 17, 2024

When will the next release be? I don't see any proposed pull requests regarding cudnn 9+ support yet. A lot of libraries are now requiring it...

@MahmoudAshraf97
Copy link
Contributor

#1803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python-release Python package release failed
Projects
None yet
Development

No branches or pull requests

9 participants