CUDNN 9 support #1780

AndrewMead10 · 2024-09-11T20:01:26Z

currently trying to use whisperX (which uses faster whisper, which uses CTranslate2), and am getting the following error

Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!

which from what I can tell is due to having CUDNN version > 9. This is an issue because it looks like pytorch >= 2.4.0 are compiled with CUDNN > 9.

See discussion here

SYSTRAN/faster-whisper#958

Workaround right now is just to use a pytorch version < 2.4

The text was updated successfully, but these errors were encountered:

minhthuc2502 · 2024-09-12T09:37:39Z

Considering whether upgrading to PyTorch >=2.4.0 is necessary at this time to avoid impacting users on cuDNN 8. However, it might be better to follow PyTorch and upgrade to cuDNN 9.

drake7707 · 2024-09-25T14:27:55Z

Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.

I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)

fedirz · 2024-09-27T02:56:27Z

Just a heads up, I was able to compile it succesfully against CUDA 12.4 and CUDNN9 without any code changes.

I used the pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel docker image and updated the Dockerfile somewhat, then copied /opt/ctranslate2 to my runtime and installed the wheel there and it works without an issue (I needed faster-wisper)

Hey, would you mind sharing your Dockerfile and any additional relevant commands you've used? I'm trying to switch over project faster-whisper-server over to latest CUDA with CUDNN9. Thanks!

drake7707 · 2024-09-27T06:11:39Z

#FROM nvidia/cuda:12.1.0-devel-ubuntu20.04 as builder
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        python3-dev \
        python3-pip \
        wget \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
    apt-key add *.PUB && \
    rm *.PUB && \
    echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
        intel-oneapi-mkl-devel-$ONEAPI_VERSION \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip --no-cache-dir install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
    tar xf *.tar.gz && \
    rm *.tar.gz && \
    cd oneDNN-* && \
    cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r oneDNN-*

ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
    tar xf *.tar.bz2 && \
    rm *.tar.bz2 && \
    cd openmpi-* && \
    ./configure && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r openmpi-*

RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12

COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}

RUN mkdir build_tmp && \
    cd build_tmp && \
    cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
          -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
          -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
          -DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
    VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN cd python && \
    python3 -m pip --no-cache-dir install -r install_requirements.txt && \
    python3 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT


ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

#COPY --from=builder $CTRANSLATE2_ROOT $CTRANSLATE2_ROOT
RUN python3 -m pip --no-cache-dir install $CTRANSLATE2_ROOT/*.whl
#&& \
#    rm $CTRANSLATE2_ROOT/*.whl

ENTRYPOINT ["/opt/ctranslate2/bin/ct2-translator"]

Build it with docker build --progress plain -f Dockerfile ..

If you have problems, I've pushed the image to the docker container registry drake7707/ctranslate2-cudnn9. You can copy /opt/ctranslate2 out of it and to your own, Don't forget to add it to the LD_LIBRARY_PATH and install the built wheel (also in /opt/ctranslate2). I didn't have to change anything else to get faster-whisper to work.

The dockerfile is mostly the same, I got a circular dependency with the multistage build for some reason and I couldn't spot the issue quickly so I did away with that. I had to install the cudnn9 dev kit with RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12 and made sure to not remove the wheel file so I could still copy it out. At the runtime side I had an issue where my libstdcxx was outdated in my runtime container anaconda environment, it's linked against against a newer version here, but a conda install -c conda-forge libstdcxx-ng=12 --yes fixed that.

jhj0517 · 2024-10-04T10:46:00Z

Regarding faster-whisper, I was able to reproduce the same bug on torch => 2.4.0.

According to pytorch/pytorch#100974, torch uses its own dependent cudNN and torch >= 2.4.0 is therefore incompatible with CTranslate2.

It would be great if CTranslate2 supported cuDNN 9 so I could use it on torch >= 2.4.0.

kittsil · 2024-10-07T05:46:31Z

@drake7707 's great Dockerfile worked for me, except that:

I am running python 3.10, whereas the base image has python 3.11. I explicitly installed python3.10 with apt-get and then used that for all python commands after the cmake.
I wanted to be able to install the wheel with pip in a venv and have it "just work." To do that, I needed the shared binaries in the wheel, so I used (auditwheel) to "repair" the wheel.

After those changes (at the bottom), I was able to get the wheel out of the docker container and install it as a dependency with pip:

(.venv) $ docker build . -t drake7707/ctranslate2-cudnn9:python3.10
(.venv) $ docker run -it --rm -v ./outputdir:/opt/share/outputdir drake7707/ctranslate2-cudnn9:python3.10
root # cp /opt/ctranslate2/ctranslate2-*.whl /opt/share/outputdir/
root # exit
(.venv) $ pip install outputdir/ctranslate2-*.whl

Here is my Dockerfile, mostly what @drake7707 originally wrote:

FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel as builder

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    python3.10-dev \
    python3-dev \
    python3-pip \
    wget \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /root

ENV ONEAPI_VERSION=2023.0.0
RUN wget -q https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB && \
    apt-key add *.PUB && \
    rm *.PUB && \
    echo "deb https://apt.repos.intel.com/oneapi all main" > /etc/apt/sources.list.d/oneAPI.list && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
    intel-oneapi-mkl-devel-$ONEAPI_VERSION \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN python3 -m pip --no-cache-dir install cmake==3.22.*

ENV ONEDNN_VERSION=3.1.1
RUN wget -q https://github.com/oneapi-src/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz && \
    tar xf *.tar.gz && \
    rm *.tar.gz && \
    cd oneDNN-* && \
    cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF . && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r oneDNN-*

ENV OPENMPI_VERSION=4.1.6
RUN wget -q https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2 && \
    tar xf *.tar.bz2 && \
    rm *.tar.bz2 && \
    cd openmpi-* && \
    ./configure && \
    make -j$(nproc) install && \
    cd .. && \
    rm -r openmpi-*

RUN apt-get update && apt-get install -y libcudnn9-dev-cuda-12

COPY third_party third_party
COPY cli cli
COPY include include
COPY src src
COPY cmake cmake
COPY python python
COPY CMakeLists.txt .

ARG CXX_FLAGS
ENV CXX_FLAGS=${CXX_FLAGS:-"-msse4.1"}
ARG CUDA_NVCC_FLAGS
ENV CUDA_NVCC_FLAGS=${CUDA_NVCC_FLAGS:-"-Xfatbin=-compress-all"}
ARG CUDA_ARCH_LIST
ENV CUDA_ARCH_LIST=${CUDA_ARCH_LIST:-"Common"}
ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=/usr/local/lib/:${LD_LIBRARY_PATH}

RUN mkdir build_tmp && \
    cd build_tmp && \
    cmake -DCMAKE_INSTALL_PREFIX=${CTRANSLATE2_ROOT} \
    -DWITH_CUDA=ON -DWITH_CUDNN=ON -DWITH_MKL=ON -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP \
    -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="${CXX_FLAGS}" \
    -DCUDA_NVCC_FLAGS="${CUDA_NVCC_FLAGS}" -DCUDA_ARCH_LIST="${CUDA_ARCH_LIST}" -DWITH_TENSOR_PARALLEL=ON .. && \
    VERBOSE=1 make -j$(nproc) install

ENV LANG=en_US.UTF-8
COPY README.md .

RUN cd python && \
    python3.10 -m pip --no-cache-dir install -r install_requirements.txt && \
    python3.10 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT

ENV CTRANSLATE2_ROOT=/opt/ctranslate2
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CTRANSLATE2_ROOT/lib

RUN python3.10 -m pip --no-cache-dir install auditwheel && \
    auditwheel repair --plat linux_x86_64 $CTRANSLATE2_ROOT/*.whl && \
    cp /root/wheelhouse/ctranslate2-*.whl ${CTRANSLATE2_ROOT}/

CMD ["bash"]

Jiltseb · 2024-10-08T09:08:36Z

@minhthuc2502 Could you please let us know if there is an update on this and the timeline estimate?

minhthuc2502 · 2024-10-08T15:32:36Z

Hello, I will update cudnn 9 for the next release. currently you can build ctranslate2 with cudnn 9 without any problem.

BBC-Esq · 2024-10-17T11:44:04Z

When will the next release be? I don't see any proposed pull requests regarding cudnn 9+ support yet. A lot of libraries are now requiring it...

MahmoudAshraf97 · 2024-10-21T11:21:02Z

#1803

minhthuc2502 added dependencies Pull requests that update a dependency file python-release Python package release failed labels Sep 12, 2024

This was referenced Oct 4, 2024

limit pytorch version to cudnn8 for pip install SYSTRAN/faster-whisper#958

Closed

Fix cudnn problem jhj0517/Whisper-WebUI#318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN 9 support #1780

CUDNN 9 support #1780

AndrewMead10 commented Sep 11, 2024 •

edited

Loading

minhthuc2502 commented Sep 12, 2024

drake7707 commented Sep 25, 2024

fedirz commented Sep 27, 2024

drake7707 commented Sep 27, 2024 •

edited

Loading

jhj0517 commented Oct 4, 2024

kittsil commented Oct 7, 2024

Jiltseb commented Oct 8, 2024 •

edited

Loading

minhthuc2502 commented Oct 8, 2024

BBC-Esq commented Oct 17, 2024

MahmoudAshraf97 commented Oct 21, 2024

CUDNN 9 support #1780

CUDNN 9 support #1780

Comments

AndrewMead10 commented Sep 11, 2024 • edited Loading

minhthuc2502 commented Sep 12, 2024

drake7707 commented Sep 25, 2024

fedirz commented Sep 27, 2024

drake7707 commented Sep 27, 2024 • edited Loading

jhj0517 commented Oct 4, 2024

kittsil commented Oct 7, 2024

Jiltseb commented Oct 8, 2024 • edited Loading

minhthuc2502 commented Oct 8, 2024

BBC-Esq commented Oct 17, 2024

MahmoudAshraf97 commented Oct 21, 2024

AndrewMead10 commented Sep 11, 2024 •

edited

Loading

drake7707 commented Sep 27, 2024 •

edited

Loading

Jiltseb commented Oct 8, 2024 •

edited

Loading