Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug and Fix shutdown #50

Open
Yaxuan-w opened this issue Oct 9, 2024 · 4 comments
Open

Debug and Fix shutdown #50

Yaxuan-w opened this issue Oct 9, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@Yaxuan-w
Copy link
Member

Yaxuan-w commented Oct 9, 2024

Previously, when testing newest fdtable interface with Lind general test suite, the shutdown test failed, and error seems related to close. Open an issue to track the progress of debug and fixes.

@Yaxuan-w Yaxuan-w added the bug Something isn't working label Oct 9, 2024
@Yaxuan-w Yaxuan-w self-assigned this Oct 9, 2024
@Yaxuan-w
Copy link
Member Author

Yaxuan-w commented Oct 9, 2024

After perfroming close() on one process, read() syscall in another syscall should return immediately because of socket has already been closed. However, we found rust libc will never return. My analysis is that after porting new fdtables, code logic of inner close should also be changed. Further analysis and fixes plan is needed. This bug should be able to deal issues of shutdown etc syscall as well. So I comment here.

@Yaxuan-w
Copy link
Member Author

Yaxuan-w commented Nov 6, 2024

There’s an occasional bug in the shutdown.c test but other tests related to shutdown works well (shutdown_fork.c and simple programs I wrote). The cause is that during close_syscall, an exit_syscall is also triggered as part of the handling process. Sometimes, exit_syscall removes the fdtable first, while close_syscall runs slower. As a result, close_syscall might attempt to close a file descriptor that has already been removed by exit_syscall.

To address the modifications in fdtable, I propose adding locks. Given the high frequency of read and write operations on the fdtable interface, I recommend using Arc for efficient synchronization. However, I'd like to open this up for discussion before proceeding with the fix. That would be really helpful if @JustinCappos @rennergade @yzhang71 have any suggestions / comments on this issue.

@JustinCappos
Copy link
Member

I don't think this needs to directly go into fdtables. The issue here isn't locking, as I understand it, but it's what happens when a thread is aborted.

The big question from yesterday is what happens when an shutdown occurs. What happens if you pthread kill a Rust thread? Should we be doing this? What happens? Is this ever likely to be safe? We need to understand these things first.

@rennergade
Copy link
Contributor

I thought about this a bit last night and it should be simple enough to safely exit threads within trusted code (RawPOSIX) by adding an exit flag and checking it (a bit more complicated when in blocking kernel calls but we can implement the same thing we did in RustPOSIX with timeouts).

There are less options for exiting while stuck in untrusted code (ie userspace loops etc). I wonder if we can harness this to handle them: bytecodealliance/wasmtime#1490

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants