Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo run --example 01-allocate error: Error: DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected") #300

Open
phial3 opened this issue Nov 22, 2024 · 11 comments

Comments

@phial3
Copy link

phial3 commented Nov 22, 2024

What's wrong with it?

root@73624c6f4c4e:/code/cudarc# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

root@73624c6f4c4e:/code/cudarc# cargo run --example  01-allocate --features cuda-12020
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/examples/01-allocate`
Error: DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")
@HVisMyLife
Copy link

Try running nvidia-smi, if you get failure it means that that your gpu isn't connected, or you lack drivers etc

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

Try running nvidia-smi, if you get failure it means that that your gpu isn't connected, or you lack drivers etc

the gpu driver is ok, I don't know why it not work?

[root@mh-server-88 cudarc]# cargo run --example 01-allocate --features cuda-12020
   Compiling cudarc v0.12.1 (/home/temp/test/cudarc)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 13.77s
     Running `target/debug/examples/01-allocate`
Error: DriverError(CUDA_ERROR_STUB_LIBRARY, "<Failure when calling cuGetErrorString()>")

[root@mh-server-88 cudarc]# nvidia-smi
Thu Nov 28 15:20:36 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01             Driver Version: 535.113.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:81:00.0 Off |                  N/A |
|  0%   42C    P2              42W / 170W |   2249MiB / 12288MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     14096      C   ffmpeg                                      291MiB |
|    0   N/A  N/A     41394      C   python                                     1948MiB |
+---------------------------------------------------------------------------------------+

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

the main reasons on there in src/driver/safe/threading.rs

impl CudaDevice {
    /// Binds the device to the calling thread. You must call this before
    /// using the device on a separate thread!
    pub fn bind_to_thread(&self) -> Result<(), DriverError> {
        unsafe { result::ctx::set_current(self.cu_primary_ctx) }
    }
}

the test is failed!

cargo test test_threading --features cuda-12020

error:

running 1 test
test driver::safe::threading::tests::test_threading ... FAILED

failures:

---- driver::safe::threading::tests::test_threading stdout ----
thread 'driver::safe::threading::tests::test_threading' panicked at src/driver/safe/threading.rs:21:39:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_STUB_LIBRARY, "<Failure when calling cuGetErrorString()>")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    driver::safe::threading::tests::test_threading

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 156 filtered out; finished in 0.00s

error: test failed, to rerun pass `--lib`

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

[root@mh-server-88 driver]# ls -al /usr/local/cuda/lib64/*
lrwxrwxrwx 1 root root        19 Oct 20  2023 /usr/local/cuda/lib64/libaccinj64.so -> libaccinj64.so.12.2
lrwxrwxrwx 1 root root        23 Oct 20  2023 /usr/local/cuda/lib64/libaccinj64.so.12.2 -> libaccinj64.so.12.2.142
-rwxr-xr-x 1 root root   2412216 Oct 20  2023 /usr/local/cuda/lib64/libaccinj64.so.12.2.142
lrwxrwxrwx 1 root root        17 Oct 20  2023 /usr/local/cuda/lib64/libcublasLt.so -> libcublasLt.so.12
lrwxrwxrwx 1 root root        25 Oct 20  2023 /usr/local/cuda/lib64/libcublasLt.so.12 -> ./libcublasLt.so.12.2.5.6
-rwxr-xr-x 1 root root 525843792 Oct 20  2023 /usr/local/cuda/lib64/libcublasLt.so.12.2.5.6
-rw-r--r-- 1 root root 770686098 Oct 20  2023 /usr/local/cuda/lib64/libcublasLt_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libcublas.so -> libcublas.so.12
lrwxrwxrwx 1 root root        23 Oct 20  2023 /usr/local/cuda/lib64/libcublas.so.12 -> ./libcublas.so.12.2.5.6
-rwxr-xr-x 1 root root 106675248 Oct 20  2023 /usr/local/cuda/lib64/libcublas.so.12.2.5.6
-rw-r--r-- 1 root root 168600104 Oct 20  2023 /usr/local/cuda/lib64/libcublas_static.a
-rw-r--r-- 1 root root   1653530 Oct 20  2023 /usr/local/cuda/lib64/libcudadevrt.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libcudart.so -> libcudart.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libcudart.so.12 -> libcudart.so.12.2.140
-rwxr-xr-x 1 root root    683360 Oct 20  2023 /usr/local/cuda/lib64/libcudart.so.12.2.140
-rw-r--r-- 1 root root   1379326 Oct 20  2023 /usr/local/cuda/lib64/libcudart_static.a
-rwxr-xr-x 1 root root 125102160 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_infer.so
-rwxr-xr-x 1 root root 125102160 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_infer.so.8
-rwxr-xr-x 1 root root 125102160 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_infer.so.8.9.4
-rw-r--r-- 1 root root 127681322 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_infer_static.a
-rw-r--r-- 1 root root 127681322 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_infer_static_v8.a
-rwxr-xr-x 1 root root 116110224 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_train.so
-rwxr-xr-x 1 root root 116110224 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_train.so.8
-rwxr-xr-x 1 root root 116110224 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_train.so.8.9.4
-rw-r--r-- 1 root root 118540274 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_train_static.a
-rw-r--r-- 1 root root 118540274 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_adv_train_static_v8.a
-rwxr-xr-x 1 root root 613288264 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_infer.so
-rwxr-xr-x 1 root root 613288264 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8
-rwxr-xr-x 1 root root 613288264 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8.9.4
-rw-r--r-- 1 root root 736626820 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_infer_static.a
-rw-r--r-- 1 root root 736626820 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_infer_static_v8.a
-rwxr-xr-x 1 root root 125264112 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_train.so
-rwxr-xr-x 1 root root 125264112 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_train.so.8
-rwxr-xr-x 1 root root 125264112 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_train.so.8.9.4
-rw-r--r-- 1 root root 176155304 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_train_static.a
-rw-r--r-- 1 root root 176155304 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_cnn_train_static_v8.a
-rwxr-xr-x 1 root root  90833344 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_infer.so
-rwxr-xr-x 1 root root  90833344 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_infer.so.8
-rwxr-xr-x 1 root root  90833344 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_infer.so.8.9.4
-rw-r--r-- 1 root root  94002920 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_infer_static.a
-rw-r--r-- 1 root root  94002920 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_infer_static_v8.a
-rwxr-xr-x 1 root root  70930680 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_train.so
-rwxr-xr-x 1 root root  70930680 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_train.so.8
-rwxr-xr-x 1 root root  70930680 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_train.so.8.9.4
-rw-r--r-- 1 root root  71395778 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_train_static.a
-rw-r--r-- 1 root root  71395778 Oct 20  2023 /usr/local/cuda/lib64/libcudnn_ops_train_static_v8.a
-rwxr-xr-x 1 root root    142008 Oct 20  2023 /usr/local/cuda/lib64/libcudnn.so
-rwxr-xr-x 1 root root    142008 Oct 20  2023 /usr/local/cuda/lib64/libcudnn.so.8
-rwxr-xr-x 1 root root    142008 Oct 20  2023 /usr/local/cuda/lib64/libcudnn.so.8.9.4
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libcufft.so -> libcufft.so.11
lrwxrwxrwx 1 root root        22 Oct 20  2023 /usr/local/cuda/lib64/libcufft.so.11 -> libcufft.so.11.0.8.103
-rwxr-xr-x 1 root root 178387496 Oct 20  2023 /usr/local/cuda/lib64/libcufft.so.11.0.8.103
-rw-r--r-- 1 root root 198130050 Oct 20  2023 /usr/local/cuda/lib64/libcufft_static.a
-rw-r--r-- 1 root root 198033562 Oct 20  2023 /usr/local/cuda/lib64/libcufft_static_nocallback.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libcufftw.so -> libcufftw.so.11
lrwxrwxrwx 1 root root        23 Oct 20  2023 /usr/local/cuda/lib64/libcufftw.so.11 -> libcufftw.so.11.0.8.103
-rwxr-xr-x 1 root root   1622536 Oct 20  2023 /usr/local/cuda/lib64/libcufftw.so.11.0.8.103
-rw-r--r-- 1 root root     32078 Oct 20  2023 /usr/local/cuda/lib64/libcufftw_static.a
lrwxrwxrwx 1 root root        19 Oct 20  2023 /usr/local/cuda/lib64/libcufile_rdma.so -> libcufile_rdma.so.1
lrwxrwxrwx 1 root root        23 Oct 20  2023 /usr/local/cuda/lib64/libcufile_rdma.so.1 -> libcufile_rdma.so.1.7.2
-rwxr-xr-x 1 root root     43320 Oct 20  2023 /usr/local/cuda/lib64/libcufile_rdma.so.1.7.2
-rwxr-xr-x 1 root root     65206 Oct 20  2023 /usr/local/cuda/lib64/libcufile_rdma_static.a
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libcufile.so -> libcufile.so.0
lrwxrwxrwx 1 root root        18 Oct 20  2023 /usr/local/cuda/lib64/libcufile.so.0 -> libcufile.so.1.7.2
-rwxr-xr-x 1 root root   2970672 Oct 20  2023 /usr/local/cuda/lib64/libcufile.so.1.7.2
-rwxr-xr-x 1 root root  24195108 Oct 20  2023 /usr/local/cuda/lib64/libcufile_static.a
-rw-r--r-- 1 root root    954224 Oct 20  2023 /usr/local/cuda/lib64/libcufilt.a
lrwxrwxrwx 1 root root        18 Oct 20  2023 /usr/local/cuda/lib64/libcuinj64.so -> libcuinj64.so.12.2
lrwxrwxrwx 1 root root        22 Oct 20  2023 /usr/local/cuda/lib64/libcuinj64.so.12.2 -> libcuinj64.so.12.2.142
-rwxr-xr-x 1 root root   2832640 Oct 20  2023 /usr/local/cuda/lib64/libcuinj64.so.12.2.142
-rw-r--r-- 1 root root     30922 Oct 20  2023 /usr/local/cuda/lib64/libculibos.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libcurand.so -> libcurand.so.10
lrwxrwxrwx 1 root root        23 Oct 20  2023 /usr/local/cuda/lib64/libcurand.so.10 -> libcurand.so.10.3.3.141
-rwxr-xr-x 1 root root  96853424 Oct 20  2023 /usr/local/cuda/lib64/libcurand.so.10.3.3.141
-rw-r--r-- 1 root root  96943386 Oct 20  2023 /usr/local/cuda/lib64/libcurand_static.a
-rw-r--r-- 1 root root  16767866 Oct 20  2023 /usr/local/cuda/lib64/libcusolver_lapack_static.a
-rw-r--r-- 1 root root   1005514 Oct 20  2023 /usr/local/cuda/lib64/libcusolver_metis_static.a
lrwxrwxrwx 1 root root        19 Oct 20  2023 /usr/local/cuda/lib64/libcusolverMg.so -> libcusolverMg.so.11
lrwxrwxrwx 1 root root        27 Oct 20  2023 /usr/local/cuda/lib64/libcusolverMg.so.11 -> libcusolverMg.so.11.5.2.141
-rwxr-xr-x 1 root root  82798736 Oct 20  2023 /usr/local/cuda/lib64/libcusolverMg.so.11.5.2.141
lrwxrwxrwx 1 root root        17 Oct 20  2023 /usr/local/cuda/lib64/libcusolver.so -> libcusolver.so.11
lrwxrwxrwx 1 root root        25 Oct 20  2023 /usr/local/cuda/lib64/libcusolver.so.11 -> libcusolver.so.11.5.2.141
-rwxr-xr-x 1 root root 115505432 Oct 20  2023 /usr/local/cuda/lib64/libcusolver.so.11.5.2.141
-rw-r--r-- 1 root root 133473132 Oct 20  2023 /usr/local/cuda/lib64/libcusolver_static.a
lrwxrwxrwx 1 root root        17 Oct 20  2023 /usr/local/cuda/lib64/libcusparse.so -> libcusparse.so.12
lrwxrwxrwx 1 root root        25 Oct 20  2023 /usr/local/cuda/lib64/libcusparse.so.12 -> libcusparse.so.12.1.2.141
-rwxr-xr-x 1 root root 263825056 Oct 20  2023 /usr/local/cuda/lib64/libcusparse.so.12.1.2.141
-rw-r--r-- 1 root root 300804590 Oct 20  2023 /usr/local/cuda/lib64/libcusparse_static.a
-rw-r--r-- 1 root root   1005514 Oct 20  2023 /usr/local/cuda/lib64/libmetis_static.a
lrwxrwxrwx 1 root root        13 Oct 20  2023 /usr/local/cuda/lib64/libnppc.so -> libnppc.so.12
lrwxrwxrwx 1 root root        19 Oct 20  2023 /usr/local/cuda/lib64/libnppc.so.12 -> libnppc.so.12.2.1.4
-rwxr-xr-x 1 root root   1622512 Oct 20  2023 /usr/local/cuda/lib64/libnppc.so.12.2.1.4
-rw-r--r-- 1 root root     30686 Oct 20  2023 /usr/local/cuda/lib64/libnppc_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnppial.so -> libnppial.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libnppial.so.12 -> libnppial.so.12.2.1.4
-rwxr-xr-x 1 root root  16311088 Oct 20  2023 /usr/local/cuda/lib64/libnppial.so.12.2.1.4
-rw-r--r-- 1 root root  17784656 Oct 20  2023 /usr/local/cuda/lib64/libnppial_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnppicc.so -> libnppicc.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libnppicc.so.12 -> libnppicc.so.12.2.1.4
-rwxr-xr-x 1 root root   7115592 Oct 20  2023 /usr/local/cuda/lib64/libnppicc.so.12.2.1.4
-rw-r--r-- 1 root root   6670872 Oct 20  2023 /usr/local/cuda/lib64/libnppicc_static.a
lrwxrwxrwx 1 root root        16 Oct 20  2023 /usr/local/cuda/lib64/libnppidei.so -> libnppidei.so.12
lrwxrwxrwx 1 root root        22 Oct 20  2023 /usr/local/cuda/lib64/libnppidei.so.12 -> libnppidei.so.12.2.1.4
-rwxr-xr-x 1 root root  10998936 Oct 20  2023 /usr/local/cuda/lib64/libnppidei.so.12.2.1.4
-rw-r--r-- 1 root root  11848492 Oct 20  2023 /usr/local/cuda/lib64/libnppidei_static.a
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libnppif.so -> libnppif.so.12
lrwxrwxrwx 1 root root        20 Oct 20  2023 /usr/local/cuda/lib64/libnppif.so.12 -> libnppif.so.12.2.1.4
-rwxr-xr-x 1 root root  96000104 Oct 20  2023 /usr/local/cuda/lib64/libnppif.so.12.2.1.4
-rw-r--r-- 1 root root  98848688 Oct 20  2023 /usr/local/cuda/lib64/libnppif_static.a
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libnppig.so -> libnppig.so.12
lrwxrwxrwx 1 root root        20 Oct 20  2023 /usr/local/cuda/lib64/libnppig.so.12 -> libnppig.so.12.2.1.4
-rwxr-xr-x 1 root root  38851472 Oct 20  2023 /usr/local/cuda/lib64/libnppig.so.12.2.1.4
-rw-r--r-- 1 root root  39688494 Oct 20  2023 /usr/local/cuda/lib64/libnppig_static.a
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libnppim.so -> libnppim.so.12
lrwxrwxrwx 1 root root        20 Oct 20  2023 /usr/local/cuda/lib64/libnppim.so.12 -> libnppim.so.12.2.1.4
-rwxr-xr-x 1 root root   9884488 Oct 20  2023 /usr/local/cuda/lib64/libnppim.so.12.2.1.4
-rw-r--r-- 1 root root   8816024 Oct 20  2023 /usr/local/cuda/lib64/libnppim_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnppist.so -> libnppist.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libnppist.so.12 -> libnppist.so.12.2.1.4
-rwxr-xr-x 1 root root  38032464 Oct 20  2023 /usr/local/cuda/lib64/libnppist.so.12.2.1.4
-rw-r--r-- 1 root root  39118336 Oct 20  2023 /usr/local/cuda/lib64/libnppist_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnppisu.so -> libnppisu.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libnppisu.so.12 -> libnppisu.so.12.2.1.4
-rwxr-xr-x 1 root root    695720 Oct 20  2023 /usr/local/cuda/lib64/libnppisu.so.12.2.1.4
-rw-r--r-- 1 root root     11266 Oct 20  2023 /usr/local/cuda/lib64/libnppisu_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnppitc.so -> libnppitc.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libnppitc.so.12 -> libnppitc.so.12.2.1.4
-rwxr-xr-x 1 root root   5386864 Oct 20  2023 /usr/local/cuda/lib64/libnppitc.so.12.2.1.4
-rw-r--r-- 1 root root   4371578 Oct 20  2023 /usr/local/cuda/lib64/libnppitc_static.a
lrwxrwxrwx 1 root root        13 Oct 20  2023 /usr/local/cuda/lib64/libnpps.so -> libnpps.so.12
lrwxrwxrwx 1 root root        19 Oct 20  2023 /usr/local/cuda/lib64/libnpps.so.12 -> libnpps.so.12.2.1.4
-rwxr-xr-x 1 root root  19961080 Oct 20  2023 /usr/local/cuda/lib64/libnpps.so.12.2.1.4
-rw-r--r-- 1 root root  19844314 Oct 20  2023 /usr/local/cuda/lib64/libnpps_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnvblas.so -> libnvblas.so.12
lrwxrwxrwx 1 root root        23 Oct 20  2023 /usr/local/cuda/lib64/libnvblas.so.12 -> ./libnvblas.so.12.2.5.6
-rwxr-xr-x 1 root root    728856 Oct 20  2023 /usr/local/cuda/lib64/libnvblas.so.12.2.5.6
lrwxrwxrwx 1 root root        18 Oct 20  2023 /usr/local/cuda/lib64/libnvJitLink.so -> libnvJitLink.so.12
lrwxrwxrwx 1 root root        24 Oct 20  2023 /usr/local/cuda/lib64/libnvJitLink.so.12 -> libnvJitLink.so.12.2.140
-rwxr-xr-x 1 root root  49621536 Oct 20  2023 /usr/local/cuda/lib64/libnvJitLink.so.12.2.140
-rw-r--r-- 1 root root  63033352 Oct 20  2023 /usr/local/cuda/lib64/libnvJitLink_static.a
lrwxrwxrwx 1 root root        15 Oct 20  2023 /usr/local/cuda/lib64/libnvjpeg.so -> libnvjpeg.so.12
lrwxrwxrwx 1 root root        21 Oct 20  2023 /usr/local/cuda/lib64/libnvjpeg.so.12 -> libnvjpeg.so.12.2.2.4
-rwxr-xr-x 1 root root   6689584 Oct 20  2023 /usr/local/cuda/lib64/libnvjpeg.so.12.2.2.4
-rw-r--r-- 1 root root   6915044 Oct 20  2023 /usr/local/cuda/lib64/libnvjpeg_static.a
-rw-r--r-- 1 root root  43142322 Oct 20  2023 /usr/local/cuda/lib64/libnvptxcompiler_static.a
lrwxrwxrwx 1 root root        25 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc-builtins.so -> libnvrtc-builtins.so.12.2
lrwxrwxrwx 1 root root        29 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc-builtins.so.12.2 -> libnvrtc-builtins.so.12.2.140
-rwxr-xr-x 1 root root   2434952 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc-builtins.so.12.2.140
-rw-r--r-- 1 root root   2453996 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc-builtins_static.a
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc.so -> libnvrtc.so.12
lrwxrwxrwx 1 root root        20 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc.so.12 -> libnvrtc.so.12.2.140
-rwxr-xr-x 1 root root  58041000 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc.so.12.2.140
-rw-r--r-- 1 root root  74423956 Oct 20  2023 /usr/local/cuda/lib64/libnvrtc_static.a
lrwxrwxrwx 1 root root        18 Oct 20  2023 /usr/local/cuda/lib64/libnvToolsExt.so -> libnvToolsExt.so.1
lrwxrwxrwx 1 root root        22 Oct 20  2023 /usr/local/cuda/lib64/libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0
-rwxr-xr-x 1 root root     40136 Oct 20  2023 /usr/local/cuda/lib64/libnvToolsExt.so.1.0.0
lrwxrwxrwx 1 root root        14 Oct 20  2023 /usr/local/cuda/lib64/libOpenCL.so -> libOpenCL.so.1
lrwxrwxrwx 1 root root        16 Oct 20  2023 /usr/local/cuda/lib64/libOpenCL.so.1 -> libOpenCL.so.1.0
lrwxrwxrwx 1 root root        18 Oct 20  2023 /usr/local/cuda/lib64/libOpenCL.so.1.0 -> libOpenCL.so.1.0.0
-rwxr-xr-x 1 root root     30856 Oct 20  2023 /usr/local/cuda/lib64/libOpenCL.so.1.0.0

/usr/local/cuda/lib64/cmake:
total 12
drwxr-xr-x 6 root root   61 Oct 20  2023 .
drwxr-xr-x 4 root root 8192 Nov 28 15:17 ..
drwxr-xr-x 2 root root   64 Oct 20  2023 cccl
drwxr-xr-x 2 root root   93 Oct 20  2023 cub
drwxr-xr-x 2 root root  114 Oct 20  2023 libcudacxx
drwxr-xr-x 2 root root  140 Oct 20  2023 thrust

/usr/local/cuda/lib64/stubs:
total 2212
drwxr-xr-x 2 root root   4096 Oct 20  2023 .
drwxr-xr-x 4 root root   8192 Nov 28 15:17 ..
-rwxr-xr-x 1 root root  38872 Oct 20  2023 libcublasLt.so
-rwxr-xr-x 1 root root  79832 Oct 20  2023 libcublas.so
-rwxr-xr-x 1 root root  66272 Oct 20  2023 libcuda.so
-rwxr-xr-x 1 root root   9400 Oct 20  2023 libcufft.so
-rwxr-xr-x 1 root root  13496 Oct 20  2023 libcufftw.so
-rwxr-xr-x 1 root root   9400 Oct 20  2023 libcurand.so
-rwxr-xr-x 1 root root  29880 Oct 20  2023 libcusolverMg.so
-rwxr-xr-x 1 root root 111800 Oct 20  2023 libcusolver.so
-rwxr-xr-x 1 root root  54456 Oct 20  2023 libcusparse.so
-rwxr-xr-x 1 root root   5304 Oct 20  2023 libnppc.so
-rwxr-xr-x 1 root root 259256 Oct 20  2023 libnppial.so
-rwxr-xr-x 1 root root 136376 Oct 20  2023 libnppicc.so
-rwxr-xr-x 1 root root 177336 Oct 20  2023 libnppidei.so
-rwxr-xr-x 1 root root 263352 Oct 20  2023 libnppif.so
-rwxr-xr-x 1 root root  87224 Oct 20  2023 libnppig.so
-rwxr-xr-x 1 root root  42168 Oct 20  2023 libnppim.so
-rwxr-xr-x 1 root root 427192 Oct 20  2023 libnppist.so
-rwxr-xr-x 1 root root   9400 Oct 20  2023 libnppisu.so
-rwxr-xr-x 1 root root  54456 Oct 20  2023 libnppitc.so
-rwxr-xr-x 1 root root 222392 Oct 20  2023 libnpps.so
-rwxr-xr-x 1 root root  55064 Oct 20  2023 libnvidia-ml.so
-rwxr-xr-x 1 root root   9400 Oct 20  2023 libnvJitLink.so
-rwxr-xr-x 1 root root  13496 Oct 20  2023 libnvjpeg.so
-rwxr-xr-x 1 root root   5304 Oct 20  2023 libnvrtc.so

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

[root@mh-server-88 driver]# ldconfig -p | grep cuda
	libnvrtc.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so.12
	libnvrtc.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so
	libnvrtc-builtins.so.12.2 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.2
	libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
	libnvjpeg.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvjpeg.so.12
	libnvjpeg.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvjpeg.so
	libnvblas.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvblas.so.12
	libnvblas.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvblas.so
	libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
	libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvToolsExt.so
	libnvJitLink.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvJitLink.so.12
	libnvJitLink.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvJitLink.so
	libnpps.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnpps.so.12
	libnpps.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnpps.so
	libnppitc.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppitc.so.12
	libnppitc.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppitc.so
	libnppisu.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppisu.so.12
	libnppisu.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppisu.so
	libnppist.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppist.so.12
	libnppist.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppist.so
	libnppim.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppim.so.12
	libnppim.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppim.so
	libnppig.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppig.so.12
	libnppig.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppig.so
	libnppif.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppif.so.12
	libnppif.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppif.so
	libnppidei.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppidei.so.12
	libnppidei.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppidei.so
	libnppicc.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppicc.so.12
	libnppicc.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppicc.so
	libnppial.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppial.so.12
	libnppial.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppial.so
	libnppc.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppc.so.12
	libnppc.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnppc.so
	libcusparse.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusparse.so.12
	libcusparse.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusparse.so
	libcusolverMg.so.11 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusolverMg.so.11
	libcusolverMg.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusolverMg.so
	libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusolver.so.11
	libcusolver.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcusolver.so
	libcurand.so.10 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcurand.so.10
	libcurand.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcurand.so
	libcuinj64.so.12.2 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcuinj64.so.12.2
	libcuinj64.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcuinj64.so
	libcufile_rdma.so.1 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufile_rdma.so.1
	libcufile_rdma.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufile_rdma.so
	libcufile.so.0 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufile.so.0
	libcufile.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufile.so
	libcufftw.so.11 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufftw.so.11
	libcufftw.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufftw.so
	libcufft.so.11 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufft.so.11
	libcufft.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcufft.so
	libcudnn_ops_train.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
	libcudnn_ops_infer.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
	libcudnn_cnn_train.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
	libcudnn_cnn_infer.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
	libcudnn_adv_train.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
	libcudnn_adv_infer.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
	libcudnn.so.8 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudnn.so.8
	libcudart.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12
	libcudart.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcudart.so
	libcudadebugger.so.1 (libc6,x86-64) => /lib64/libcudadebugger.so.1
	libcuda.so.1 (libc6,x86-64) => /lib64/libcuda.so.1
	libcuda.so (libc6,x86-64) => /lib64/libcuda.so
	libcublasLt.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so.12
	libcublasLt.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so
	libcublas.so.12 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so.12
	libcublas.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcublas.so
	libaccinj64.so.12.2 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libaccinj64.so.12.2
	libaccinj64.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libaccinj64.so
	libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libOpenCL.so.1
	libOpenCL.so (libc6,x86-64) => /usr/local/cuda-12.2/targets/x86_64-linux/lib/libOpenCL.so

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

[root@mh-server-88 driver]# lspci | grep -i nvidia
81:00.0 VGA compatible controller: NVIDIA Corporation Device 2487 (rev a1)
81:00.1 Audio device: NVIDIA Corporation Device 228b (rev a1)

[root@mh-server-88 driver]# lsmod | grep nvidia
nvidia_uvm           1287341  4
nvidia_drm             58061  0
nvidia_modeset       1294577  1 nvidia_drm
nvidia              56563717  584 nvidia_modeset,nvidia_uvm
drm_kms_helper        186531  2 ast,nvidia_drm
drm                   468454  6 ast,ttm,drm_kms_helper,nvidia,nvidia_drm

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

add test to src/driver/result.rs:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn link_test() {
	 // cuda driver link init
        let init_result = unsafe { lib().cuInit(0) };
        println!("CUDA init result: {:?}", init_result);

        let mut version: i32 = 0;
        let result = unsafe { lib().cuDriverGetVersion(&mut version as *mut i32) };
        match result {
            sys::CUresult::CUDA_SUCCESS => {
                println!("Driver Version = {:?}", version);
            }
            _ => {
                panic!("Cannot get driver version");
            }
	}
    }
}

run test named src/driver/result.rs is ok:

[root@mh-server-88 driver]# cargo test link_test --features cuda-12020
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.02s
     Running unittests src/lib.rs (/home/temp/test/cudarc/target/debug/deps/cudarc-6fe0907686d62a51)

running 1 test
CUDA init result: CUDA_SUCCESS
Driver Version = 12020
test driver::result::tests::link_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 157 filtered out; finished in 0.01s

@phial3
Copy link
Author

phial3 commented Nov 28, 2024

ps:

  1. Error: CUDA_ERROR_STUB_LIBRARY
export LD_LIBRARY_PATH=/usr/local/cuda-12.2/lib64:/usr/lib64:$LD_LIBRARY_PATH
  1. Error: DriverError(CUDA_ERROR_NO_DEVICE, "no CUDA-capable device is detected")

?

  1. Error: DriverError(CUDA_ERROR_NOT_INITIALIZED, "initialization error")
    cuda driver initialization method init() in src/driver/result.rs
/// Initializes the CUDA driver API.
/// **MUST BE CALLED BEFORE ANYTHING ELSE**
///
/// See [cuInit() docs](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html#group__CUDA__INITIALIZE_1g0a2f1517e1bd8502c7194c3a8c134bc3)
pub fn init() -> Result<(), DriverError> {
    unsafe { lib().cuInit(0).result() }
}

If nothing else, the method CudaDevice::new(0) call succeeded, however, a runtime error still occurs Error: DriverError(CUDA_ERROR_NOT_INITIALIZED, "initialization error"):

impl CudaDevice {
    /// Creates a new [CudaDevice] on device index `ordinal`.
    pub fn new(ordinal: usize) -> Result<Arc<Self>, result::DriverError> {
        // called in every CudaDevice::new() 
        result::init()?;

        let cu_device = result::device::get(ordinal as i32)?;

        // primary context initialization, can fail with OOM
        let cu_primary_ctx = unsafe { result::primary_ctx::retain(cu_device) }?;

        unsafe { result::ctx::set_current(cu_primary_ctx) }.unwrap();

        // can fail with OOM
        let event = result::event::create(sys::CUevent_flags::CU_EVENT_DISABLE_TIMING)?;

        let value = unsafe {
            result::device::get_attribute(
                cu_device,
                sys::CUdevice_attribute_enum::CU_DEVICE_ATTRIBUTE_MEMORY_POOLS_SUPPORTED,
            )?
        };
        let is_async = value > 0;

        let device = CudaDevice {
            cu_device,
            cu_primary_ctx,
            stream: std::ptr::null_mut(),
            event,
            modules: RwLock::new(BTreeMap::new()),
            ordinal,
            is_async,
        };
        Ok(Arc::new(device))
    }
}

@coreylowman
Copy link
Owner

Thanks for all these additional details. So you are still getting the initialization error at this point?

@coreylowman
Copy link
Owner

I wonder if this is linked to #253 which we could never figure out

@phial3
Copy link
Author

phial3 commented Dec 20, 2024

Thanks for all these additional details. So you are still getting the initialization error at this point?

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants