Skip to content

windows: Fix server hangs under some circumstance #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 2, 2025

Conversation

JunkuiZhang
Copy link
Contributor

@JunkuiZhang JunkuiZhang commented Feb 28, 2025

Hi, thank you so much for creating such a high-quality Rust library.

We encountered a tricky issue while using this library in the development of Zed (https://github.com/zed-industries/zed). After testing, we identified the issue as a bug in notify. This PR provides a test case, test_windows_trash_dir(), which reproduces the bug.

The bug can be reproduced as follows: in a project directory /path/to/project, there is a subfolder /path/to/project/child. If we monitor changes in the child folder, then move the child folder to the trash (deleting the folder directly does not trigger the bug), and monitor the project directory, the server gets stuck at the WaitForSingleObjectEx command.

This issue seems to be related to the bAlertable flag in WaitForSingleObjectEx, but we are still unclear why setting bAlertable to TRUE causes this bug.

@JunkuiZhang
Copy link
Contributor Author

Moving a folder to the Recycle Bin or deleting a folder triggers ERROR_ACCESS_DENIED, but this error wasn't handled properly, leading to an infinite loop. Here’s what happens in detail:

  1. When dir_a is being watched, the server calls ReadDirectoryChangesW to monitor changes and registers handle_event as the completion routine.

  2. After dir_a is moved to the Recycle Bin, a change is detected, triggering handle_event. At this point, error_code is ERROR_ACCESS_DENIED.

  3. The original code does not handle this error, so execution continues, calling start_read, which in turn calls ReadDirectoryChangesW again.

  4. Since dir_a no longer exists, ReadDirectoryChangesW immediately returns 1, which is interpreted as a successful call. This causes handle_event to be registered again.

  5. Windows then calls handle_event again, with ERROR_ACCESS_DENIED, repeating the cycle indefinitely:

    handle_event(ERROR_ACCESS_DENIED) -> start_read(ReadDirectoryChangesW, return 1) -> handle_event(ERROR_ACCESS_DENIED) -> start_read(ReadDirectoryChangesW, return 1)...
    

Interestingly, when the folder is deleted instead of moved to the Recycle Bin, the second ReadDirectoryChangesW call returns 0, indicating failure, which prevents the loop. Additionally, setting bAlertable = FALSE stops the completion routine from being called, avoiding the issue entirely.

Changes in this PR:

  • Handle ERROR_ACCESS_DENIED: If the folder no longer exists, it has either been deleted or moved to the Recycle Bin, so we unwatch it.
  • Improve error handling: Known error codes are handled appropriately. For unknown error codes, start_read is not called to prevent potential infinite loops in the future.

@JunkuiZhang JunkuiZhang marked this pull request as ready for review March 7, 2025 09:13
@dfaust
Copy link
Member

dfaust commented Jun 1, 2025

Hi @JunkuiZhang. Thanks so much for fixing this issue. The changes look good.
Unfortunately I don't have Windows to test. But I think we can merge this once the pipeline has succeeded. The pipeline works on the current main branch. So maybe a rebase will fix the issue.
Also, would you mind adding a change-log entry?

@JunkuiZhang
Copy link
Contributor Author

Hi @JunkuiZhang. Thanks so much for fixing this issue. The changes look good. Unfortunately I don't have Windows to test. But I think we can merge this once the pipeline has succeeded. The pipeline works on the current main branch. So maybe a rebase will fix the issue. Also, would you mind adding a change-log entry?

I’ve rebased my changes onto the main branch and added the relevant entry to the changelog. BTW, if there’s anything that doesn’t look right or if there’s anything else you’d like me to tweak, just let me know!

@dfaust
Copy link
Member

dfaust commented Jun 1, 2025

Thanks for the quick update.

[target.'cfg(windows)'.dev-dependencies]
trash.workspace = true

Should fix the CI issue.

Copy link
Member

@dfaust dfaust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found two micro issues :)

_ => {
// Some unidentified error occurred, log and unwatch the directory, then return.
log::error!(
"Unknown error in ReadDirectoryChangesW for directory {}: {}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other log messages start lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return;
}
ERROR_ACCESS_DENIED => {
// This could hanppen when the watched directory is deleted or trashed, first check if it's the case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dfaust dfaust merged commit 65ea4b7 into notify-rs:main Jun 2, 2025
17 checks passed
@dfaust
Copy link
Member

dfaust commented Jun 2, 2025

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants