-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qBit v4.4.3.1 + libtorrent 2.0.x freezes regularly (Arch Linux) #6952
Comments
Some more information: I met this issue four times in the last month. When I meet this issue, I need to long-press the power key to shutdown because even the shutdown freezes. Stack trace
|
Exactly same problem. I had to press Alt + Impr Pant + R E I S U B to reboot the system. |
Could be related: #6937 |
It's possible. |
How many torrents do you have and how old is the settings profile? Did you have similar issues with a fresh profile and only a very small amount of torrents? Just wondering if the issue I've encountered could be similar to this or distinct |
Around 2431 torrents. |
I'm not sure how to check its date if it's not on any of the files, but it sure sounds old enough given the amount of torrents. I lately restored my old profile that I cleaned up due to issues I had generally with qBittorrent in order to check some files of older torrents, and also noticed it was lagging heavily, especially on startup (but not instantly, it could be around a second after opening). Switching back to the new almost-fresh profile made it work as it should on the other hand. My amount of torrents was much smaller though, it could be around 100 or so, perhaps too many of them in queued/unfinished state? Worth to note I'm on Windows though, which is why I'm unsure if it could be the same thing or not. But try renaming the old profile (not sure if on Linux the config and the stuff like |
I have gone back to using version 1.x of libtorrent, I don't really need the features of version 2.x. |
I also have similar issues I've reverted to the 4.3.9 version that is in a docker container but I prefer to run natively. |
I can also confirm that I hit it with libtorrent-rasterbar-2.0.7 and net-p2p/qbittorrent-4.4.5-r1. My qbittorrent profile has a handful of years for sure. |
The problem is not the qbittorrent profile or its age, since the problem is not in qbittorrent. The problem is caused by libtorrent 2.x because of the way it manages memory and/or resources. |
I wonder if its really libtorrent problem or libtorrent is just a trigger to it. When it happens, the qbittorrent process goes defunct, 'ps aux' will hang on reading the /proc/PID_OF_QBT/cmdline indefinitely and killing it by pid is the only way to recovery, however it will then become zombie and hang around as |
There is an interesting discussion in this qBittorrent thread: qbittorrent/qBittorrent#17545 (comment) |
I am not entirely sure if it's related, but since rebuilding qbittorrent with 1.2 series libtorrent I no longer have this issue, and I no longer have null pointer popping out in my kernel log either. The issue could be related to triggering in-kernel fault after all
|
I have this same issue on deluge. deluge: 2.1.1 x64 (running in docker linuxserver/deluge) The Unraid community has been tracking this issue here where many of us using both deluge and qbittorent, experienced this same issue. Note that this issue was only recently reported with the newest series of the Unraid distribution (i.e. Linux kernel). |
Seems like it indeed is kernel bug that is getting triggered with series 2 libtorrent, the same null pointer dereferences errors as my boxes reported above. |
Same problem: 6.0.9 gentoo-kernel, qbittorrent 4.4.5, libtorrent-rasterbar 2.0.6. |
... Reposting what I posted on deluge forum, in response to JesterEE's reported findings:
|
More info about disabling THP here: I have it deactivated and qBittorrent is already working correctly. But the question is: whose fault is this malfunction? libtorrent, qBittorrent/Deluge or the Linux Kernel? |
Looks like issue is gone after kernel >= 6.0.12 |
Not for me. |
Are you guys using encryption? I've found disabling queues fixes the issue for me: |
No encryption, but I noticed something. It deadlocks hard, when adding a large torrent in qBit. But if you manage to kill qBit, and restart it, it restarts and works just fine. |
I am using simple patch to prevent kernel null pointer dereference. --- linux-gentoo/mm/filemap.c 2022-12-14 00:00:00.000 +0200
+++ linux-gentoo-old/mm/filemap.c 2022-12-06 00:00:00.000 +0200
@@ -1850,22 +1850,24 @@
static void *mapping_get_entry(struct address_space *mapping, pgoff_t index)
{
XA_STATE(xas, &mapping->i_pages, index);
struct folio *folio;
rcu_read_lock();
repeat:
xas_reset(&xas);
folio = xas_load(&xas);
if (xas_retry(&xas, folio))
goto repeat;
+ if (WARN_ON(unlikely(xa_is_internal(folio))))
+ goto repeat;
/*
* A shadow entry of a recently evicted page, or a swap entry from
* shmem/tmpfs. Return it without attempting to raise page count.
*/
if (!folio || xa_is_value(folio))
goto out;
if (!folio_try_get_rcu(folio))
goto repeat;
if (unlikely(folio != xas_reload(&xas))) { |
I am using disk encryption. Hit issue with disabled queues. Problem with kernel page cache and before any reading to page. |
@corvus1 could you reproduce if you set aio_threads=1? |
where do you put it? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I still have this problem. |
This is probably true because in my case I was also using XFS and had the same problem. Now I use ZFS and it works fine. |
https://bugzilla.kernel.org/show_bug.cgi?id=216646#c18 says that the kernel bug was not reproducible in BTRFS, but I was using BTRFS when facing the issue here. |
I have the exact issue discussed here. Deluge 2.1.1 in Docker, running on kernel version 6.1.0. Each time the issue has occurred my stack trace is consistent with the others shared here. The bug takes time to occur for me and I'm not sure what specifically the triggering mechanism is. At the time I killed the deluge instance this time, the virtual memory usage was around 800G. As for the filesystem: I am using XFS for initial downloads (on my scratch disk) and then stuff is moved to ZFS for seeding. So it's hard to say which filesystem, if any, is the culprit. For what it's worth this is a high performance setup on a 5Gbps connection. Looking at the timestamps in the kernel logs, the error occurred for me during a large download of several big torrents at once. It was probably quite busy at the time; I would guess that the vast majority of work was dealing with the XFS writes. |
I believe it just recently got fixed upstream, but not yet pulled into any released kernel though so one needs to patch yourself. -- https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=151dff099e8e6d9c8efcc75ad0ad3b8eead58704 For visibility in future, the content of patch below
|
The above short patch is actually a late addendum to a much larger patch that's still in the works. We'll definitely need the larger patch since it covers more bases and specifically the issue that is causing the stack traces posted so far. Poking around at my own stack trace it looks like |
This patch will not fix this issue. filemap_get_entry() sometime returns internal xarray value. In my case it is 0x02; It is not NULL and not valid pointer. #6952 (comment) works for me. |
I stand corrected. None the less good that this has been acknowledged upstream and work on fix is undergoing, eventually will arrive in mainline. |
FWIW, I forwarded this issue over to Christoph Hellwig in case he has a moment to consider its relevance to his patch. |
@shanechko is correct I just spent some more time digging into this and it looks to me that there is some code in the kernel that uses the xarray advanced API without fully checking/resolving internal members. The result is that an internal sibling member (which is basically just an index that points to another slot in the array) is being used as a pointer, which of course gets deref'd shortly after and causes a kaboom. The solution is to fix the xarray advanced callers (i.e., callers of xas_load) so they deal with sibling entries properly. I don't have the environment spun up to build and test my own fix but I've sent it over to the kernel devs to see what they think. It is possible that the xarray getting into a state where multi-indexes/sibling members are used is unusual, perhaps a path activated by using XFS as many here have done. I believe the patch @shanechko made may work, but most likely the "right" thing to do with sibling members is to use the xas_for_each_conflict iterator as explained in the docs here: https://docs.kernel.org/core-api/xarray.html#multi-index-entries |
Verify that the memory address belongs to this process, done. Are other bad values other than 0x02 and 0x00 possible?
Is that why it is so fast (≧ω≦)/ |
It has been stable for me after updating to kernel 6.5.0-rc1, can anyone else update and check it out? |
I deployed 6.5.0-rc2 and 2.x series libtorrent yesterday, so far stable, waiting few more days to see if it does not triggers again. |
What would be the commit that fixed the issue in the kernel? It is not "fix null-pointer deref" from above, right? BTW, this only affects some non-standard filesystems, not ext4? |
Yes. kernel 6.5 add some locks for process virtual memory mapping to prevent races like this.
I think no. This issue caused by memory management. Race in page mapping code. |
I was able to reproduce the issue on 6.5-rc3. The proposed patch from Matthew Wilcox is here: It looks like it does the job for me. Can someone else who is impacted also check whether it resolves the problem? |
Fix in upstream kernel commit 31cd0859dc1fe345ebacc319b72a03e9cf80cf41 |
The "upstream" is always Linus's tree - latest release being 6.6-rc1 at the moment ;) Commit ID there is cbc02854331edc6dc22d8b77b6e22e38ebc7dd51: Also fixed in stable releases: And backported to longterm 6.1: |
bumping for progress |
I have not faced this issue for a month since kernel 6.5.3. |
Me neither
…On Thu, Oct 19, 2023 at 2:50 PM Yufan You ***@***.***> wrote:
bumping for progress
I have not faced this issue for a month since kernel 6.5.3.
—
Reply to this email directly, view it on GitHub
<#6952 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGVCF72EQR76R4MKNIIUA3YAEO2FAVCNFSM52WCUXI2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZXGA4TENJQHE4A>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thank you for maintaining qbittorrent libtorrent v1 package on Arch it has been very nice. After reading this comment I decided to try regular package provided by arch repos but I was disappointed how much lower the performance was compared to v1. The speeds are just awful on v2 compared to v1 on my system I don't know honestly what's up with that..my kernel is 6.5.9 but doesn't seem to help at least me. I guess from now on I will grab appimage of qbittorrent v1 from their own website since you have abandoned the package. |
Well, this is fixed. |
qbittorrent: 4.4.3.1 x64
OS: Arch Linux
Qt: 6.3.0 and 5.15.4
libtorrent-rasterbar: 2.0.6
I reported it first in the qBittorrent bugtracker but the problem only happens with version 2.x of libtorrent, if I use version 1.x it works without problems. So I'm not sure where the problem really lies: in the program or in the library.
I have the computer always on with qBittorrent running. Every now and then qBittorrent freezes.
I have to restart the computer because I can't even kill the process.
The text was updated successfully, but these errors were encountered: