Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: rocBLAS error: Cannot read ...\AMD\ROCm\6.1\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036 #551

Open
3 of 6 tasks
marnel-estrada opened this issue Oct 25, 2024 · 7 comments

Comments

@marnel-estrada
Copy link

marnel-estrada commented Oct 25, 2024

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

I followed this guide. When I run webui-user.bat, this error is the end result. I searched for this error and I found this. However, I don't understand what it means to "setting the environment variable export HSA_OVERRIDE_GFX_VERSION=10.3.0". I don't think that's the same with setting the PATH variable. I have updated to the current drivers. My GPU is RX7800.

Steps to reproduce the problem

  1. Follow the guide here.
  2. The error shows when running webui-user.bat.

What should have happened?

I expected it to proceed with no errors?

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

I don't know what to do here. The WebUI is not opened. I tried --dump-sysinfo but there are Python errors:

D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Traceback (most recent call last):
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\launch.py", line 48, in <module>
    main()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\launch.py", line 29, in main
    filename = launch_utils.dump_sysinfo()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 700, in dump_sysinfo
    text = sysinfo.get()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\sysinfo.py", line 46, in get
    res = get_dict()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\sysinfo.py", line 119, in get_dict
    "Extensions": get_extensions(enabled=True, fallback_disabled_extensions=config.get('disabled_extensions', [])),
AttributeError: 'str' object has no attribute 'get'
Press any key to continue . . .

Console logs

venv "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-12-gae5ff7a2
Commit hash: ae5ff7a232cd898f653e4fffb36f507b54de8b72
Using ZLUDA in D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\.zluda
ROCm agents: ['gfx1036', 'gfx1101'], using gfx1036
Skipping onnxruntime installation.
You are up to date with the most recent release.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --update-check --skip-ort

rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\6.1\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1036

rocBLAS error: Could not initialize Tensile host:
regex_error(error_backref): The expression contained an invalid back reference.
Press any key to continue . . .

Additional information

I updated the GPU driver prior to following the steps.

@marnel-estrada
Copy link
Author

I tried adding this in webui.bat but it still showed the same error:
set HSA_OVERRIDE_GFX_VERSION=10.3.0

@lshqqytiger
Copy link
Owner

Don't set HSA_OVERRIDE_GFX_VERSION. Instead, add --device-id 1 or set HIP_VISIBLE_DEVICES=1.

@marnel-estrada
Copy link
Author

marnel-estrada commented Oct 25, 2024

I used those and got another set of errors:

venv "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-12-gae5ff7a2
Commit hash: ae5ff7a232cd898f653e4fffb36f507b54de8b72
Using ZLUDA in D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\.zluda
ROCm agents: ['gfx1101'], using gfx1101
Skipping onnxruntime installation.
You are up to date with the most recent release.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Invalid device id: str
Traceback (most recent call last):
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\errors.py", line 98, in run
    code()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\devices.py", line 118, in enable_tf32
    if cuda_no_autocast():
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\devices.py", line 30, in cuda_no_autocast
    torch.cuda.get_device_capability(device_id) == (7, 5)
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\__init__.py", line 430, in get_device_capability
    prop = get_device_properties(device)
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\__init__.py", line 447, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\launch.py", line 48, in <module>
    main()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\launch.py", line 39, in main
    prepare_environment()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 671, in prepare_environment
    from modules import devices
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\devices.py", line 125, in <module>
    errors.run(enable_tf32, "Enabling TF32")
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\errors.py", line 100, in run
    display(task, e)
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\errors.py", line 68, in display
    te = traceback.TracebackException.from_exception(e)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\traceback.py", line 572, in from_exception
    return cls(type(exc), exc, exc.__traceback__, *args, **kwargs)
AttributeError: 'str' object has no attribute '__traceback__'
Press any key to continue . . .

I also saw here that gfx1036 is not supported.

@marnel-estrada
Copy link
Author

I edited the code in cuda/init.py to print the device values to this:

def get_device_properties(device: _device_t) -> _CudaDeviceProperties:
    r"""Get the properties of a device.

    Args:
        device (torch.device or int or str): device for which to return the
            properties of the device.

    Returns:
        _CudaDeviceProperties: the properties of the device
    """
    _lazy_init()  # will define _get_device_properties
    print("device from parameter: " + str(device))
    print("Device Count: " + str(device_count()))
    device = _get_device_index(device, optional=True)
    print("device after _get_device_index(): " + str(device))
    if device < 0 or device >= device_count():
        raise AssertionError("Invalid device id")
    return _get_device_properties(device)  # type: ignore[name-defined]

This is what I got. Not sure if it helps:

venv "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-12-gae5ff7a2
Commit hash: ae5ff7a232cd898f653e4fffb36f507b54de8b72
Using ZLUDA in D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\.zluda
ROCm agents: ['gfx1101'], using gfx1101
Skipping onnxruntime installation.
You are up to date with the most recent release.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
device from parameter: 0
Device Count: 1
device after _get_device_index(): 0
device from parameter: 0
Device Count: 1
device after _get_device_index(): 0
device from parameter: 0
Device Count: 1
device after _get_device_index(): 0
device from parameter: 1
Device Count: 1
device after _get_device_index(): 1
Invalid device id: str
Traceback (most recent call last):
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\errors.py", line 98, in run
    code()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\devices.py", line 118, in enable_tf32
    if cuda_no_autocast():
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\devices.py", line 30, in cuda_no_autocast
    torch.cuda.get_device_capability(device_id) == (7, 5)
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\__init__.py", line 430, in get_device_capability
    prop = get_device_properties(device)
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\__init__.py", line 450, in get_device_properties
    raise AssertionError("Invalid device id")
AssertionError: Invalid device id

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\launch.py", line 48, in <module>
    main()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\launch.py", line 39, in main
    prepare_environment()
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 671, in prepare_environment
    from modules import devices
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\devices.py", line 125, in <module>
    errors.run(enable_tf32, "Enabling TF32")
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\errors.py", line 100, in run
    display(task, e)
  File "D:\AI\sd\SD-Zluda\stable-diffusion-webui-amdgpu\modules\errors.py", line 68, in display
    te = traceback.TracebackException.from_exception(e)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\traceback.py", line 572, in from_exception
    return cls(type(exc), exc, exc.__traceback__, *args, **kwargs)
AttributeError: 'str' object has no attribute '__traceback__'
Press any key to continue . . .

_get_device_index() returned 1 at the end there which triggers the AssertionError.

@lshqqytiger
Copy link
Owner

--device-id is untested feature on zluda. I couldn't test as I don't have APU or another AMDGPU. Anyway, set HIP_VISIBLE_DEVICES=1 should work.

@CS1o
Copy link

CS1o commented Oct 28, 2024

HIP SDK found your APU (gfx1036) before your dedicated GPU.

You can fix that by adding: set HIP_VISIBLE_DEVICES=1
to the webui-user.bat.

Or by opening up the Device Manager and under Display Adapters disable the Radeon TM Graphics.
Then relaunch the webui-user.bat

Also its not recommended to use Python from MS Store.
From your log:

File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\traceback.py", line 572, in from_exception
    return cls(type(exc), exc, exc.__traceback__, *args, **kwargs)
AttributeError: 'str' object has no attribute '__traceback__'

It seems you have and old Python Version installed via the MS Store.
Uninstall it via Settings -> Apps -> Apps and Features.

@TiberiumCat
Copy link

Alternative option if you're not using the APU. Disable it in Device Manager.
On one hand you have to remember to disable it every time you update video drivers. On the other hand, nothing else is going to give you issues by trying to use it.
Capture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants