Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to convert Vulkan driver statistics to RGA format on Linux #71

Open
farnoy opened this issue Sep 9, 2020 · 2 comments
Open

failed to convert Vulkan driver statistics to RGA format on Linux #71

farnoy opened this issue Sep 9, 2020 · 2 comments

Comments

@farnoy
Copy link

farnoy commented Sep 9, 2020

Hi,

I'm trying to get a very simple pipeline analyzed, but I can't get RGA to work in online mode. The output just says Error: failed to convert Vulkan driver statistics to RGA format.

I believe it's failing right here

status = (result ? beStatus_SUCCESS : beStatus_Vulkan_ParseStatsFailed);

I'm on Linux x64 5.8.7, AMDVLK 2020.Q3.4 and:

$ vulkaninfo | rg PhysicalDeviceProp -A10
VkPhysicalDeviceProperties:
---------------------------
	apiVersion     = 4202646 (1.2.150)
	driverVersion  = 8388763 (0x80009b)
	vendorID       = 0x1002
	deviceID       = 0x66af
	deviceType     = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
	deviceName     = AMD Radeon VII

Is this an incompatibility with the latest AMDVLK release?

Also a few minor questions if I may:

  • Is the PSO format generated with Fossilize? I saw the one for DX12 and it looks completely custom, while the Vulkan one is just a JSON dump
  • Will there be support for source mapping in the future? I wanted to create a Language Server Protocol to embed live ISA & register pressure in my code editor while editing shaders, but there doesn't seem to be a way to map this analysis back to the source.
@AmitBM
Copy link
Contributor

AmitBM commented Sep 9, 2020

Hi farnoy,

  1. What Linux variant are you using? Note that RGA officially only support Ubuntu.
  2. Is there a Vulkan ICD manifest file present under /opt/amdgpu-pro/etc/vulkan/icd.d/amd_icd64.json on your system? If you set the VK_ICD_FILENAMES environment variable to /opt/amdgpu-pro/etc/vulkan/icd.d/amd_icd64.json - are you still seeing the same error? This should force the amdgpu-pro driver to be used (which is the driver RGA relies on).

-=-=-=-

To your questions:

  • The .gpso and .cpso file formats were derived from the Fossilize format. At the time of development, Fossilize used to pack the SPIR-V binaries inside the file and also supported packing multiple pipelines within the same Fossilize file. RGA's .cpso and .gpso files describe a single pipeline and do not store the SPIR-V binaries .
  • I assume that you are referring to correlation from GLSL/SPIR-V to ISA disassembly, similarly to what RGA supports in ROCm OpenCL mode. This feature is in our roadmap but since it requires updates to AMD's shader compiler it is not expected to be available soon.

@farnoy
Copy link
Author

farnoy commented Sep 9, 2020

I'm on Archlinux and my AMDVLK installation is being built from source with this script

I used VK_ICD_FILENAMES in the original report, I have a separate Mesa radv stack that I didn't want RGA to use.

Thanks for taking my questions, I was indeed referring referring to the OpLine instructions that both glslang and dxc can output. It should be a useful feature for livereg and/or assembly when it's ready.

I did a bit more digging and found something interesting. I've modified the rga bash script wrapper to execute rga-bin --verbose "$@". This showed me intermediate commands in the GUI window. The full output was:


Building Vulkan project "asd" for gfx906

./rga -s vulkan --isa "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/disassem.txt" --parse-isa --line-numbers --analysis "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/resourceUsage.csv" -b "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/codeobj.bin" --log "/home/kuba/.local/share/RadeonGPUAnalyzer/rga-cli-20200909-214907.log" --icd "/usr/share/vulkan/icd.d/amd_icd64.json" --glslang-opt "@--target-env vulkan1.1@" --compiler-bin "/home/kuba/1.2.148.1/x86_64/bin" --session-metadata "/home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_cliInvocation.xml" --asic gfx906 --pso "/home/kuba/RadeonGPUAnalyzer/projects/asd/Clone0/Pipeline0.gpso" --vert "/data/renderer/src/shaders/gui.vert" --frag "/data/renderer/src/shaders/gui.frag"

Info: forcing the Vulkan runtime to load a custom ICD: /usr/share/vulkan/icd.d/amd_icd64.json

Launching external process:
/home/kuba/rga/Vulkan//VulkanBackend --list-targets --icd /usr/share/vulkan/icd.d/amd_icd64.json
Target GPU detected:

gfx906 (Vega)
AMD Radeon VII

Pre-compiling vertex shader file (/data/renderer/src/shaders/gui.vert) to SPIR-V binary (/home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv)... Launching external process:
/home/kuba/1.2.148.1/x86_64/bin/glslangValidator --target-env vulkan1.1 -V -o /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv /data/renderer/src/shaders/gui.vert
succeeded.
Pre-compiling fragment shader file (/data/renderer/src/shaders/gui.frag) to SPIR-V binary (/home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv)... Launching external process:
/home/kuba/1.2.148.1/x86_64/bin/glslangValidator --target-env vulkan1.1 -V -o /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv /data/renderer/src/shaders/gui.frag
succeeded.
Building for gfx906... Launching external process:
/home/kuba/rga/Vulkan//VulkanBackend --target gfx906 --vert /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv --vert-isa /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_disassem_vert.txt --vert-stats /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_resourceUsage_vert.csv --frag /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv --frag-isa /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_disassem_frag.txt --frag-stats /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_resourceUsage_frag.csv --bin /home/kuba/RadeonGPUAnalyzer/projects/asd/Output/Clone0/gfx906_codeobj.bin --pso /home/kuba/RadeonGPUAnalyzer/projects/asd/Clone0/Pipeline0.gpso --icd /usr/share/vulkan/icd.d/amd_icd64.json

Using Vulkan ICD from custom location: /usr/share/vulkan/icd.d/amd_icd64.json

failed.
Error: failed to convert Vulkan driver statistics to RGA format.

However, when I ran the 3 leaf commands manually in a shell (two glslang's and VulkanBackend), it all works fine and to show this:

$ /home/kuba/rga/Vulkan//VulkanBackend --target gfx906 \
  --vert /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_vert.spv \
  --frag /home/kuba/.rga/GPUOpen/rga/all-devices_rga-temp-out2393283_frag.spv \
  --frag-stats /dev/stdout \
  --pso /home/kuba/RadeonGPUAnalyzer/projects/asd/Clone0/Pipeline0.gpso \
  --icd /usr/share/vulkan/icd.d/amd_icd64.json

Using Vulkan ICD from custom location: /usr/share/vulkan/icd.d/amd_icd64.json
Statistics:
    - shaderStageMask                           = 16
    - resourceUsage.numUsedVgprs                = 24
    - resourceUsage.numUsedSgprs                = 14
    - resourceUsage.ldsSizePerLocalWorkGroup    = 65536
    - resourceUsage.ldsUsageSizeInBytes         = 0
    - resourceUsage.scratchMemUsageInBytes      = 0
    - numPhysicalVgprs                          = 256
    - numPhysicalSgprs                          = 800
    - numAvailableVgprs                         = 256
    - numAvailableSgprs                         = 104

So when the GUI invokes it, it fails, but when I do the same thing from from the shell it works.
EDIT: nevermind, it's the GUI that throws an error

I tried redirecting the VulkanBackend binary with a script like this to enable api_dump:

#!/usr/bin/fish

ls ~/.rga/GPUOpen/rga/*.spv

set -x VK_INSTANCE_LAYERS VK_LAYER_LUNARG_api_dump
set -x VK_APIDUMP_LOG_FILENAME /tmp/vulkanbackend-api-dump

eval (dirname (status -f))/VulkanBackend-bin $argv

I also verified that the .spv files exist (they do) and set up API dump. But everything exits cleanly with the last API calls being:

Thread 0, Frame 0:
vkGetShaderInfoAMD(device, pipeline, shaderStage, infoType, pInfoSize, pInfo) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x37aa990
    pipeline:                       VkPipeline = 0x326f760
    shaderStage:                    VkShaderStageFlagBits = 16 (VK_SHADER_STAGE_FRAGMENT_BIT)
    infoType:                       VkShaderInfoTypeAMD = VK_SHADER_INFO_TYPE_DISASSEMBLY_AMD (2)
    pInfoSize:                      size_t* = 2156
    pInfo:                          void* = 0x345c860

Thread 0, Frame 0:
vkGetShaderInfoAMD(device, pipeline, shaderStage, infoType, pInfoSize, pInfo) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x37aa990
    pipeline:                       VkPipeline = 0x326f760
    shaderStage:                    VkShaderStageFlagBits = 16 (VK_SHADER_STAGE_FRAGMENT_BIT)
    infoType:                       VkShaderInfoTypeAMD = VK_SHADER_INFO_TYPE_STATISTICS_AMD (0)
    pInfoSize:                      size_t* = 72
    pInfo:                          void* = 0x7ffcf9966140

Thread 0, Frame 0:
vkDestroyPipeline(device, pipeline, pAllocator) returns void:
    device:                         VkDevice = 0x37aa990
    pipeline:                       VkPipeline = 0x326f760
    pAllocator:                     const VkAllocationCallbacks* = NULL

So for each stage, it's collecting the binary, disassembly and statistics, all as expected. The only weird thing is that it returns fictional devices. I guess that's configured out of band because I only see the effects:

4808   │ Thread 0, Frame 0:
4809   │ vkGetPhysicalDeviceProperties(physicalDevice, pProperties) returns void:
4810   │     physicalDevice:                 VkPhysicalDevice = 0x384ab10
4811   │     pProperties:                    VkPhysicalDeviceProperties* = 0x7ffcf86bb060:
4812   │         apiVersion:                     uint32_t = 0
4813   │         driverVersion:                  uint32_t = 0
4814   │         vendorID:                       uint32_t = 0
4815   │         deviceID:                       uint32_t = 31
4816   │         deviceType:                     VkPhysicalDeviceType = VK_PHYSICAL_DEVICE_TYPE_OTHER (0)
4817   │         deviceName:                     char[VK_MAX_PHYSICAL_DEVICE_NAME_SIZE] = "NAVI14:gfx1012"

On the other hand, the GUI only has a problem with my vertex shader, if I remove it from the pipeline, offline mode is used but I don't see the error about driver statistics. If I remove the fragment shader and leave only the vertex, it fails again. Not sure what makes it so special, it's a very simple shader:

#version 450

#extension GL_EXT_scalar_block_layout: require

layout(push_constant, scalar) uniform PushConstants {
    vec2 scale;
    vec2 translate;
} pushConstants;

layout (location = 0) in vec2 pos;
layout (location = 1) in vec2 uv;
layout (location = 2) in vec4 col;

layout (location = 0) out vec4 out_color;
layout (location = 1) out vec2 out_uv;

void main() {
  out_color = col;
  out_uv = uv;
  gl_Position = vec4(pos * pushConstants.scale + pushConstants.translate, 0, 1);
  gl_Position.y *= -1.0;
}

I hope this helps. I understand that only Ubuntu is officially supported, but seeing as multiple other components are working fine, this seems to be a legitimate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants