Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to register cuDNN factory #63

Open
Shakirahnamuli12 opened this issue Feb 7, 2025 · 6 comments
Open

Unable to register cuDNN factory #63

Shakirahnamuli12 opened this issue Feb 7, 2025 · 6 comments

Comments

@Shakirahnamuli12
Copy link

I am running ReLERNN from Google Colab however I am encountering this challenge. Could you kindly help me understand how I can fix it. Some of the issues are as below

"""2025-02-07 10:57:23.227224: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-07 10:57:23.227271: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-07 10:57:23.228817: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-07 10:57:23.237717: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.""

Below is the first fatal error

FileNotFoundError: [Errno 2] No such file or directory: '/content/ReLERNN/examples/example_output/train/3615_haps.npy'
2025-02-06 18:59:23.112739: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered

@andrewkern
Copy link
Member

this looks like a file path issue:

No such file or directory: '/content/ReLERNN/examples/example_output/train/3615_haps.npy'

what is the command line you are running?

the other outputs you've included above are tensorflow reporting on the installation you've done.

@Shakirahnamuli12
Copy link
Author

Thanks Andrew for your prompt feedback. I am running ReLERNN using GOOGLE COLAB NOTEBOOK.

From the readme it says, " ReLERNN requires the use of a CUDA-Enabled NVIDIA GPU. The current version of ReLERNN has been successfully tested with tensorflow/2.2.0, cudatoolkit/10.1.243, and cudnn/7.6.5." ReLERNN was tested with CUDA 10.1, cuDNN 7.6.5, and TensorFlow 2.2.0, but Colab only supports newer versions (CUDA 12+).
My question would be if Google Colab has been used as a platform to run ReLERNN before or maybe if you would recommend so.

@Shakirahnamuli12
Copy link
Author

When I run the example script, this is the output.

2025-02-09 09:37:20.774190: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-09 09:37:20.774256: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-09 09:37:20.775750: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-09 09:37:20.782814: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-09 09:37:21.785091: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Warning: no demographic history file found. All training data will be simulated under demographic equilibrium.
Split chromosome: 2L...
Split chromosome: 3L...
Converting /content/ReLERNN/examples/example_output/splitVCFs/example_3L:0-742000.vcf to HDF5...
Converting /content/ReLERNN/examples/example_output/splitVCFs/example_2L:0-840000.vcf to HDF5...
Split chromosome: 3R...
Split chromosome: 2R...
Converting /content/ReLERNN/examples/example_output/splitVCFs/example_3R:0-1963000.vcf to HDF5...
Converting /content/ReLERNN/examples/example_output/splitVCFs/example_2R:0-1669000.vcf to HDF5...
Split chromosome: X...
Converting /content/ReLERNN/examples/example_output/splitVCFs/example_X:0-1250000.vcf to HDF5...
Reading HDF5: "/content/ReLERNN/examples/example_output/splitVCFs/example_2L:0-840000.hdf5"...
Reading HDF5: "/content/ReLERNN/examples/example_output/splitVCFs/example_3L:0-742000.hdf5"...
Reading HDF5: "/content/ReLERNN/examples/example_output/splitVCFs/example_2R:0-1669000.hdf5"...
Reading HDF5: "/content/ReLERNN/examples/example_output/splitVCFs/example_3R:0-1963000.hdf5"...
Reading HDF5: "/content/ReLERNN/examples/example_output/splitVCFs/example_X:0-1250000.hdf5"...

Accessibility mask found: calculating the proportion of the genome that is masked...
1.3% of genome inaccessible

Simulating with window size = 211000 bp.
Training set:
Simulate...
Process Process-7:
Process Process-8:
Validation set:
Simulate...
Process Process-9:
Process Process-10:
Test set:
Simulate...
Process Process-11:
Traceback (most recent call last):

SIMULATIONS FINISHED!

SANITY CHECK

numSegSites Min Mean Max
/usr/local/bin/ReLERNN_SIMULATE:238: RuntimeWarning: overflow encountered in scalar add
print("Simulated:\t\t\t%s\t%s\t%s" %(minSegSites, int(sum(SS)/float(len(SS))), maxSegSites))
Simulated: -7756042983262126080 101196593457430 7277900098418245632
InputVCF 2L:0-840000: 238 909 1741
InputVCF 2R:0-1669000: 411 1000 1754
InputVCF 3L:0-742000: 143 909 1777
InputVCF 3R:0-1963000: 358 1000 1759
InputVCF X:0-1250000: 127 1000 1720

ReLERNN_SIMULATE.py FINISHED!

2025-02-09 09:37:32.334610: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-09 09:37:32.334667: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-09 09:37:32.335925: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-09 09:37:32.343117: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-09 09:37:33.337631: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-02-09 09:37:34.496341: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-02-09 09:37:35.255496: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Traceback (most recent call last):
File "/usr/local/bin/ReLERNN_TRAIN", line 130, in
main()
File "/usr/local/bin/ReLERNN_TRAIN", line 109, in main
runModels(ModelFuncPointer=GRU_TUNED84,
File "/usr/local/lib/python3.11/dist-packages/ReLERNN/helpers.py", line 343, in runModels
x,y = TrainGenerator.getitem(0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ReLERNN/sequenceBatchGenerator.py", line 271, in getitem
X, y = self.__data_generation(indices)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ReLERNN/sequenceBatchGenerator.py", line 287, in __data_generation
H = np.load(Hfilepath)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/numpy/lib/npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/content/ReLERNN/examples/example_output/train/3615_haps.npy'
2025-02-09 09:37:39.480725: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-09 09:37:39.480779: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-09 09:37:39.482081: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-09 09:37:39.489530: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-09 09:37:40.501039: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Importing HDF5: "/content/ReLERNN/examples/example_output/splitVCFs/example_2L:0-840000.hdf5"...
2025-02-09 09:37:41.503249: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2025-02-09 09:37:41.555345: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Traceback (most recent call last):
File "/usr/local/bin/ReLERNN_PREDICT", line 155, in
main()
File "/usr/local/bin/ReLERNN_PREDICT", line 122, in main
load_and_predictVCF(VCFGenerator=vcf_gen,
File "/usr/local/lib/python3.11/dist-packages/ReLERNN/helpers.py", line 284, in load_and_predictVCF
jsonFILE = open(network[0],"r")
^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/content/ReLERNN/examples/example_output/networks/model.json'
2025-02-09 09:37:44.876772: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-02-09 09:37:44.876820: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-02-09 09:37:44.878143: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-09 09:37:44.885344: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-02-09 09:37:45.883840: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Error: no .PREDICT.txt file found. You must run ReLERNN_PREDICT.py prior to running ReLERNN_BSCORRECT.py

END
What could be the challenge?

@andrewkern
Copy link
Member

I haven't tried to run this in a Google colab instance before. Let me try myself and get back to you. I'm guessing this has to do with package version issues.

@andrewkern
Copy link
Member

after playing with this a bit, i don't see a simple way to use our software in a google colab environment-- we need finer grained control of which python packages / python version we use than they allow

@Shakirahnamuli12
Copy link
Author

Alright Andrew, thank you for the your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants