Skip to content

Conversation

@ONykyf
Copy link
Contributor

@ONykyf ONykyf commented Nov 23, 2025

Adding asyncFlipPrivateKeyRec to modesetting has revealed one more problem. Current code walks along all screens and initializes screen resources, then gc's, stipples, root windows for each of them, hence after the first screen registering new private keys is no more possible. This crashes modesetting driver if it is not initialized before others.

Namely, after screen 0 is initialized, and root window for it is created, global_keys[PRIVATE_WINDOW].created becomes non-zero, hence no new private keys of this type can be registered, and modeset(1) fails at dixRegisterPrivatekey().

This patch makes screen resources for all screen initialize first, hence all necessary private keys (including of the type PRIVATE_WINDOW) are initialized before root windows are created. Tests show that it prevents occasional crashes on NVidia390 and constant crashes on NVidia 470, and makes no difference for NVidia 340 because it is not modesetting capable.

Current code walks along all screens and initializes screen resources,
then gc's, stipples, root windows for each of them, hence after
the first screen registering new private keys is no more possible.
This crashes modesetting driver if it is not initialized before others.

This patch makes screen resources for all screen initialize first, hence
all necessary private keys (including of the type PRIVATE_WINDOW) are
initialized before root windows are created.

Signed-off-by: Oleh Nykyforchyn <[email protected]>
@ONykyf ONykyf requested review from a team, metux and stefan11111 November 23, 2025 12:58
@stefan11111
Copy link
Contributor

@ONykyf Are you sure the problem is recent?
We were already registering other private keys before async flips.

Looking at 77830f2 ,
globals are 0-initialized regardless of if they are declared static or not.

Are you sure you are not hitting something else?

@stefan11111
Copy link
Contributor

Other drivers like xf86-video-intel also register private keys, so this should affect more than modesetting.

Can you post the xorg.conf you used when you hit this?

@ONykyf
Copy link
Contributor Author

ONykyf commented Nov 23, 2025

@ONykyf Are you sure the problem is recent? We were already registering other private keys before async flips.

Yes because it's OK with xlibre-xserver-25.0.0.15, which has no async...

GDB debugging shows this.

@ONykyf
Copy link
Contributor Author

ONykyf commented Nov 23, 2025

Another thing that looks strange to me that xlibre-xserver-master initializes modeset(1) together with NVIDIA(0) for an NVidia card, while xlibre-xserver-25.0.0.15 does this only for NVIDIA(0), see log files stripped off time marks to simplify the comparison:

Xorg.0-b.log

Xorg.0-g.log

@stefan11111
Copy link
Contributor

@ONykyf Support for server generations was recently dropped, and I suspect we did a bad job with it.
Does this still happen on 434331e ?

@ONykyf
Copy link
Contributor Author

ONykyf commented Nov 23, 2025

@ONykyf Support for server generations was recently dropped, and I suspect we did a bad job with it. Does this still happen on 434331e ?

Now I have to go, will return at the evening and check this, if necessary.

@stefan11111
Copy link
Contributor

Looks like there is an actual problem here.
If I use the fbdev driver on the first screen, and the modesetting driver on the second screen, this is what happens on screen teardown:

waiting for X server to shut down X connection to :0 broken (explicit kill or server shutdown).
double free or corruption (!prev)

Backtrace:
0: X (xorg_backtrace+0x56) [0x5622e0b97a81]
1: X (0x5622e09ff000+0x19e1b2) [0x5622e0b9d1b2]
2: /lib64/libc.so.6 (0x7fd3d5c05000+0x3cbb0) [0x7fd3d5c41bb0]
3: /lib64/libc.so.6 (0x7fd3d5c05000+0x91eec) [0x7fd3d5c96eec]
4: /lib64/libc.so.6 (gsignal+0x12) [0x7fd3d5c41a82]
5: /lib64/libc.so.6 (abort+0x24) [0x7fd3d5c29f3b]
6: /lib64/libc.so.6 (0x7fd3d5c05000+0x260fd) [0x7fd3d5c2b0fd]
7: /lib64/libc.so.6 (0x7fd3d5c05000+0x9b8a7) [0x7fd3d5ca08a7]
8: /lib64/libc.so.6 (0x7fd3d5c05000+0x9d9ec) [0x7fd3d5ca29ec]
9: /lib64/libc.so.6 (0x7fd3d5c05000+0x9db10) [0x7fd3d5ca2b10]
10: /lib64/libc.so.6 (__libc_free+0x180) [0x7fd3d5ca5740]
11: X (_dixFreeObjectWithPrivates+0x30) [0x5622e0a9211b]
12: X (DeleteWindow+0x1d8) [0x5622e0aa85d7]
13: X (0x5622e09ff000+0x9d16d) [0x5622e0a9c16d]
14: X (FreeClientResources+0xcb) [0x5622e0a9cb14]
15: X (FreeAllResources+0x49) [0x5622e0a9cc0b]
16: X (0x5622e09ff000+0x6d4cb) [0x5622e0a6c4cb]
17: X (0x5622e09ff000+0x24551b) [0x5622e0c4451b]
18: /lib64/libc.so.6 (0x7fd3d5c05000+0x26faa) [0x7fd3d5c2bfaa]
19: /lib64/libc.so.6 (__libc_start_main+0x85) [0x7fd3d5c2c065]
20: X (_start+0x21) [0x5622e0a168e1]


Fatal server error:
Caught signal 6 (Aborted). Server aborting

This was a problem in the modesetting driver before, with the vrr private key.
It didn't usually manifest because it is disabled by default, so the privates aren't initialized.

@metux Looking at the vrr privates, they are stored in the modesetting driver struct instead of being globals.
Is this right?

@stefan11111
Copy link
Contributor

Trying to debug this, when using current master, I get the error above when I don't enable vrr, but when I do enable vrr, the problem disappears.

@stefan11111
Copy link
Contributor

ACK from me, patch looks good and fixes the issues I could reproduce on my side.

@metux What is the difference between gpu screens and regular screens here?
Does the server split the screens into gpu and non-gpu based on hw acceleration or something?

How come this code works fine on Xorg?

@metux
Copy link
Contributor

metux commented Nov 24, 2025

@metux Looking at the vrr privates, they are stored in the modesetting driver struct instead of being globals. Is this right?

No, that's asking for trouble. We already had this discussion on some PR (don't recall exactly which one).
the private key structs must always be global, because they can be accessed some time later, when driver structs already gone.

If one needs separate instances per screen, that's what dixRegisterScreenPrivateKey() et al are for.

EDIT: the scenario is: a driver needs to attach some private data to eg. a pixmap, but here one per screen (IOW: on n screens, this pixmap will have n different privates, one for each screen).

@metux
Copy link
Contributor

metux commented Nov 24, 2025

ACK from me, patch looks good and fixes the issues I could reproduce on my side.

@metux What is the difference between gpu screens and regular screens here? Does the server split the screens into gpu and non-gpu based on hw acceleration or something?

For multi-screen and offloading (eg. via prime).
gpu-screens aren't directly visible (on the protocol), but are acting as slaves to the actual screens.

@stefan11111
Copy link
Contributor

@metux ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants