Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vo_gpu_next: use emulated formats only as fallback #13682

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sfan5
Copy link
Member

@sfan5 sfan5 commented Mar 10, 2024

how to check: ./build/mpv --vo=gpu-next --msg-level=vf=trace,vo=debug --force-window --idle video.mp4
compare libplacebo gpu formats table to vo format table

fixes mpv-android/mpv-android#855

Copy link

github-actions bot commented Mar 10, 2024

Download the artifacts for this pull request:

Windows
macOS

@kasper93
Copy link
Contributor

I don't understand this patch. Why does it select gbrpf32 as best format in the first place? This requires software conversion before uploading to gpu.

The fact it is fmt->emulated on libplacebo side is not important in this case at all. It only means that libplacebo internally will upload data differently, in practice there is little to no performance impact. Maybe libplacebo shouldn't expose f32 format or mpv should be smart enough to not select f32 formats, but fmt->emulated is not the right tool to make that distinction.

@sfan5
Copy link
Member Author

sfan5 commented Mar 11, 2024

These are the relevant infos in that case:

[vo/gpu-next/libplacebo:debug] GPU texture formats:
[vo/gpu-next/libplacebo:debug]     NAME                 TYPE   SIZE COMP CAPS         EMU DEPTH         HOST_BITS     GLSL_TYPE  GLSL_FMT   FOURCC
[vo/gpu-next/libplacebo:debug]     rgba8                UNORM  4    RGBA S-LRbBV--H-- n   {8  8  8  8 } {8  8  8  8 } vec4       rgba8      AB24  
[vo/gpu-next/libplacebo:debug]     r8                   UNORM  1    R    S-LRbBV----- n   {8  0  0  0 } {8  0  0  0 } float      r8         R8    
[vo/gpu-next/libplacebo:debug]     rg8                  UNORM  2    RG   S-LRbBV----- n   {8  8  0  0 } {8  8  0  0 } vec2       rg8        GR88  
[vo/gpu-next/libplacebo:debug]     bgra8                UNORM  4    BGRA S-LRbBV----- n   {8  8  8  8 } {8  8  8  8 } vec4       rgba8      AR24  
[vo/gpu-next/libplacebo:debug]     r8u                  UINT   1    R    S--R-BV----- n   {8  0  0  0 } {8  0  0  0 } uint       r8ui             
[vo/gpu-next/libplacebo:debug]     rg8u                 UINT   2    RG   S--R-BV----- n   {8  8  0  0 } {8  8  0  0 } uvec2      rg8ui            
[vo/gpu-next/libplacebo:debug]     rgba8u               UINT   4    RGBA S--R-BV----- n   {8  8  8  8 } {8  8  8  8 } uvec4      rgba8ui          
[vo/gpu-next/libplacebo:debug]     r16u                 UINT   2    R    S--R-BV----- n   {16 0  0  0 } {16 0  0  0 } uint       r16ui            
[vo/gpu-next/libplacebo:debug]     rg16u                UINT   4    RG   S--R-BV----- n   {16 16 0  0 } {16 16 0  0 } uvec2      rg16ui           
[vo/gpu-next/libplacebo:debug]     rgba16u              UINT   8    RGBA S--R-BV----- n   {16 16 16 16} {16 16 16 16} uvec4      rgba16ui         
[vo/gpu-next/libplacebo:debug]     rgb8                 UNORM  3    RGB  S-LRbBV----- y   {8  8  8  0 } {8  8  8  0 } vec3                  BG24  
[vo/gpu-next/libplacebo:debug]     r16f                 FLOAT  4    R    S-LRbB------ y   {16 0  0  0 } {32 0  0  0 } float      r16f             
[vo/gpu-next/libplacebo:debug]     rg16f                FLOAT  8    RG   S-LRbB------ y   {16 16 0  0 } {32 32 0  0 } vec2       rg16f            
[vo/gpu-next/libplacebo:debug]     rgba16f              FLOAT  16   RGBA S-LRbB------ y   {16 16 16 16} {32 32 32 32} vec4       rgba16f          
[vo/gpu-next/libplacebo:debug]     rgb16f               FLOAT  12   RGB  S-L--------- y   {16 16 16 0 } {32 32 32 0 } vec3                        
[vo/gpu-next/libplacebo:debug]     rgb8u                UINT   3    RGB  S-----V----- y   {8  8  8  0 } {8  8  8  0 } uvec3                       
[vo/gpu-next/libplacebo:debug]     rgb16u               UINT   6    RGB  S-----V----- y   {16 16 16 0 } {16 16 16 0 } uvec3                       
[vo/gpu-next/libplacebo:debug]     r32f                 FLOAT  4    R    ------V----- y   {32 0  0  0 } {32 0  0  0 } float      r32f             
[vo/gpu-next/libplacebo:debug]     rg32f                FLOAT  8    RG   ------V----- y   {32 32 0  0 } {32 32 0  0 } vec2       rg32f            
[vo/gpu-next/libplacebo:debug]     rgb32f               FLOAT  12   RGB  ------V----- y   {32 32 32 0 } {32 32 32 0 } vec3                        
[vo/gpu-next/libplacebo:debug]     rgba32f              FLOAT  16   RGBA ------V----- y   {32 32 32 32} {32 32 32 32} vec4       rgba32f          
[vo/gpu-next:v] Assuming 60.000004 FPS for display sync.
[vf:trace] VO reports supported formats:
[vf:trace]   yuv444p        (2)
[vf:trace]   yuv420p        (2)
[vf:trace]   gray           (2)
[vf:trace]   nv12           (2)
[vf:trace]   argb           (2)
[vf:trace]   bgra           (2)
[vf:trace]   abgr           (2)
[vf:trace]   rgba           (2)
[vf:trace]   bgr24          (1)
[vf:trace]   rgb24          (1)
[vf:trace]   0rgb           (2)
[vf:trace]   bgr0           (2)
[vf:trace]   0bgr           (2)
[vf:trace]   rgb0           (2)
[vf:trace]   yap8           (2)
[vf:trace]   y1             (2)
[vf:trace]   gbrp1          (2)
[vf:trace]   gbrp2          (2)
[vf:trace]   gbrp3          (2)
[vf:trace]   gbrp4          (2)
[vf:trace]   gbrp5          (2)
[vf:trace]   gbrp6          (2)
[vf:trace]   yuv422p        (2)
[vf:trace]   yuv410p        (2)
[vf:trace]   yuv411p        (2)
[vf:trace]   yuvj422p       (2)
[vf:trace]   nv21           (2)
[vf:trace]   yuv440p        (2)
[vf:trace]   yuvj440p       (2)
[vf:trace]   yuva420p       (2)
[vf:trace]   ya8            (2)
[vf:trace]   gbrp           (2)
[vf:trace]   yuva422p       (2)
[vf:trace]   yuva444p       (2)
[vf:trace]   nv16           (2)
[vf:trace]   gbrap          (2)
[vf:trace]   yuvj411p       (2)
[vf:trace]   gbrpf32        (1)
[vf:trace]   gbrapf32       (1)
[vf:trace]   nv24           (2)
[vf:trace]   nv42           (2)
[vf:trace]   vuya           (2)
[vf:trace]   vuyx           (2)

You'd normally map yuv420p10 to 3 planes of r16, but the GPU doesn't support this.
mpv then decides the next "better" format based on which loses the least information. So it arrives at gbrpf32.

There's nothing wrong with this logic in principle IMO.

The fact it is fmt->emulated on libplacebo side is not important in this case at all. It only means that libplacebo internally will upload data differently, in practice there is little to no performance impact.

Dunno about Vulkan but my understanding of this matches what is written in the header:

    // If `emulated` is true, then this format doesn't actually exist on the
    // GPU as an uploadable texture format - and any apparent support is being
    // emulated (typically using compute shaders in the upload path).

As in: it's not about how libplacebo uploads it, but how the graphics driver processes it.

Makes use of the query_format fallback logic introduced in
the previous commit.

Fallback formats are likely to be extraordinarily slow.
Trigger for this issue was an Android device (of course)
where there are no reasonable formats to map yuv420p10le and
mpv would pick 32-bit float RGB as the next best format,
forcing a slow path in both swscale and the GLES driver.
With the new logic mpv will convert 8-bit YUV, which enables
playback at a reasonable frame rate.
@kasper93
Copy link
Contributor

Dunno about Vulkan but my understanding of this matches what is written in the header:

Yes, it would upload data to texture it little bit roundabout way, but "emulation" part alone shouldn't be that detrimental to performance.

As in: it's not about how libplacebo uploads it, but how the graphics driver processes it.

I've seen this issue some time ago and at the time I was under impression that CPU conversion is to blame. Or more specifically selecting 32f format for it. I understand that rejecting it based on fmt->emulated works, but I'm not sure it is direct reason of the issues and hence the right way to do so.

I'm not sure what to suggest here, because for this I would need to understand where exactly is the bottleneck. If it is really emulated format it would be better to fix on libplacebo and not expose them if they are broken on certain. But if the root cause is different such rejection should be done when it matters.

@sfan5
Copy link
Member Author

sfan5 commented Mar 14, 2024

I mean you're probably right that the bottleneck is the conversion from yuv420p10 or gbrpf32 and not necessarily the emulated upload.
But in this case the float stuff is the only non-8-bit format, which makes it the obvious target for preventing this from happening in the first place.

maybe @haasn has an opinion?

@haasn
Copy link
Member

haasn commented Mar 14, 2024

CPU conversion from yuv420p10 to gbrpf32 is absolutely terrible and should be avoided at all costs - considering that this also requires chroma scaling and YUV conversion in addition to very slow floating point path.

It would be far better to pick yuv420p8 and rely on swscale dithering.

@kasper93
Copy link
Contributor

kasper93 commented Mar 15, 2024

I just tested again on my end, (without this patch).

gpu-next (bad):

V mpv     : [autoconvert:info] Converting yuv420p10 -> gbrpf32
V mpv     : [ffmpeg:v] swscaler: Lanczos scaler, from yuv420p10le to gbrpf32le using C
V mpv     : [vf:v] [out] 3840x2160 gbrpf32 rgb/bt.2020/pq/full/display CL=uhd crop=3840x2160+0+0

gpu (good):

V mpv     : [vo/gpu:v] Reported display depth: 8
V mpv     : [vo/gpu:v] Texture for plane 0: 3840x2160
V mpv     : [vo/gpu:v] Texture for plane 1: 1920x1080
V mpv     : [vo/gpu:v] Texture for plane 2: 1920x1080
V mpv     : [vo/gpu:v] Testing FBO format rgba16f
V mpv     : [vo/gpu:v] Using FBO format rgba16f.

Solution: do what vo_gpu is doing. Clearly it correctly uses rgba16f FBO, instead of gbrpf32 insanity, that forces whole scaling in not optimized C code.

@haasn
Copy link
Member

haasn commented Mar 15, 2024

I just tested again on my end, (without this patch).

gpu-next (bad):


V mpv     : [autoconvert:info] Converting yuv420p10 -> gbrpf32

V mpv     : [ffmpeg:v] swscaler: Lanczos scaler, from yuv420p10le to gbrpf32le using C

V mpv     : [vf:v] [out] 3840x2160 gbrpf32 rgb/bt.2020/pq/full/display CL=uhd crop=3840x2160+0+0

gpu (good):


V mpv     : [vo/gpu:v] Reported display depth: 8

V mpv     : [vo/gpu:v] Texture for plane 0: 3840x2160

V mpv     : [vo/gpu:v] Texture for plane 1: 1920x1080

V mpv     : [vo/gpu:v] Texture for plane 2: 1920x1080

V mpv     : [vo/gpu:v] Testing FBO format rgba16f

V mpv     : [vo/gpu:v] Using FBO format rgba16f.

Solution: do what vo_gpu is doing. Clearly it correctly uses rgba16f FBO, instead of gbrpf32 insanity, that forces whole scaling in not optimized C code.

That log tells us nothing, as FBO format has nothing to do with the upload texture format. Check the [out] line.

@kasper93
Copy link
Contributor

That log tells us nothing, as FBO format has nothing to do with the upload texture format. Check the [out] line.

gpu:

V mpv     : [vd:v] Using software decoding.
V mpv     : [vd:v] Decoder format: 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/auto CL=uhd crop=3840x2160+0+0
V mpv     : [vd:v] Using container aspect ratio.
V mpv     : [vf:v] [in] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [userdeint] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [userdeint] (disabled)
V mpv     : [vf:v] [autorotate] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [autorotate] (disabled)
V mpv     : [vf:v] [convert] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [convert] (disabled)
V mpv     : [vf:v] [out] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : event: video-reconfig
V mpv     : [cplayer:info] VO: [gpu] 3840x2160 yuv420p10
V mpv     : [cplayer:v] VO: Description: Shader-based GPU Renderer
V mpv     : [vo/gpu:v] reconfig to 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vo/gpu:v] Resize: 2992x1344
V mpv     : [vo/gpu:v] Window size: 2992x1344 (Borders: l=0 t=0 r=0 b=0)
V mpv     : [vo/gpu:v] Video source: 3840x2160 (1:1)
V mpv     : [vo/gpu:v] Video display: (0, 0) 3840x2160 -> (301, 0) 2389x1344
V mpv     : [vo/gpu:v] Video scale: 0.622135/0.622222
V mpv     : [vo/gpu:v] OSD borders: l=301 t=0 r=302 b=0
V mpv     : [vo/gpu:v] Video borders: l=301 t=0 r=302 b=0
V mpv     : [vo/gpu:v] Reported display depth: 8
V mpv     : [vo/gpu:v] Texture for plane 0: 3840x2160
V mpv     : [vo/gpu:v] Texture for plane 1: 1920x1080
V mpv     : [vo/gpu:v] Texture for plane 2: 1920x1080
V mpv     : [vo/gpu:v] Testing FBO format rgba16f
V mpv     : [vo/gpu:v] Using FBO format rgba16f.
V mpv     : event: video-reconfig

gpu-next:

V mpv     : [vd:v] Using software decoding.
V mpv     : [vd:v] Decoder format: 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/auto CL=uhd crop=3840x2160+0+0
V mpv     : [vd:v] Using container aspect ratio.
V mpv     : [vf:v] [in] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [userdeint] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [userdeint] (disabled)
V mpv     : [vf:v] [autorotate] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [vf:v] [autorotate] (disabled)
V mpv     : [vf:v] [convert] 3840x2160 yuv420p10 bt.2020-ncl/bt.2020/pq/limited/display CL=uhd crop=3840x2160+0+0
V mpv     : [autoconvert:info] Converting yuv420p10 -> gbrpf32
V mpv     : [ffmpeg:v] swscaler: Lanczos scaler, from yuv420p10le to gbrpf32le using C
V mpv     : [vf:v] [out] 3840x2160 gbrpf32 rgb/bt.2020/pq/full/display CL=uhd crop=3840x2160+0+0
V mpv     : event: video-reconfig

@sfan5
Copy link
Member Author

sfan5 commented Mar 15, 2024

IIRC the difference here was that vo_gpu would use r16ui for mapping planes while libplacebo apparently can't/doesn't.
Incidentally 10-bit never worked on my phone GPU so maybe there's something that makes the uint formats less suitable for rendering? (although I wholly believe that to just be a driver bug)

@haasn
Copy link
Member

haasn commented Mar 15, 2024

IIRC the difference here was that vo_gpu would use r16ui for mapping planes while libplacebo apparently can't/doesn't.

Then that sounds like the actual bug here, not the choice of format.

@sfan5 sfan5 added the priority:on-ice may be revisited later label Mar 15, 2024
@sfan5
Copy link
Member Author

sfan5 commented Mar 15, 2024

@haasn #13706

@sfan5 sfan5 marked this pull request as draft March 16, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:on-ice may be revisited later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gpu-next: Unwatchable framerate when software decoding HEVC 10-bit video
3 participants