Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket recv failed for 一个封锁操作被对 WSACancelBlockingCall 的调用中断。 (os error 10004), break recv loop #205

Open
lizhihongTest opened this issue Apr 12, 2023 · 3 comments
Assignees
Labels
BDT bucky data transfer protocol bug Something isn't working

Comments

@lizhihongTest
Copy link
Collaborator

Describe the bug
socket recv failed for 一个封锁操作被对 WSACancelBlockingCall 的调用中断。 (os error 10004), break recv loop

To Reproduce
It is found that BDT has the following error log, which may be related to BDT UDP connection shutdown

  [2023-04-12 20:15:30.002611 +08:00] INFO [ThreadId(35)] [component\cyfs-bdt\src\interface\udp.rs:368] UdpInterface {local:L4udp192.168.100.74:36242} socket recv failed for 一个封锁操作被对 WSACancelBlockingCall 的调用中断。 (os error 10004), break recv loop

ood-installer.zip

@lizhihongTest lizhihongTest added bug Something isn't working BDT bucky data transfer protocol labels Apr 12, 2023
@lurenpluto
Copy link
Member

This should be a common system error when triggering 'recv' on a socket:

https://social.msdn.microsoft.com/Forums/vstudio/en-US/31866793-8064-4bce-a1e7-8bde3b793505/why-im-getting-exception-a-blocking-operation-was-interrupted-by-a-call-to-wsacancelblockingcall?forum=csharpgeneral

However, the key to this issue is whether it is recoverable after this error is triggered on 'recv' on the socket? Currently, it appears that the BDT layer has logs related to "break recv loop". Could this lead to premature termination of the reception on this socket?

The relevant codes as follows:

fn recv_loop(&self, weak_stack: WeakStack) {
UDP_RECV_BUFFER.with(|thread_recv_buf| {
let recv_buf = &mut thread_recv_buf.borrow_mut()[..];
loop {
let rr = self.0.socket.recv_from(recv_buf);
if rr.is_ok() {
let stack = Stack::from(&weak_stack);
let (len, from) = rr.unwrap();
trace!("{} recv {} bytes from {}", self, len, from);
let recv = &mut recv_buf[..len];
// FIXME: 分发到工作线程去
self.on_recv(stack, recv, Endpoint::from((Protocol::Udp, from)));
} else {
let err = rr.err().unwrap();
if let Some(10054i32) = err.raw_os_error() {
// In Windows, if host A use UDP socket and call sendto() to send something to host B,
// but B doesn't bind any port so that B doesn't receive the message,
// and then host A call recvfrom() to receive some message,
// recvfrom() will failed, and WSAGetLastError() will return 10054.
// It's a bug of Windows.
trace!("{} socket recv failed for {}, ingore this error", self, err);
} else {
info!("{} socket recv failed for {}, break recv loop", self, err);
break;
}
}
}
});
}

@jing-git
Copy link
Collaborator

From the above ood-installer's log:
[2023-04-12 20:15:26.490966 +08:00] INFO [ThreadId(22)] [component\cyfs-bdt\src\tunnel\udp.rs:148] UdpTunnel{local:L4udp192.168.100.74:36242,remote:L4udp172.19.0.1:8050} dead for connecting timeout
It can be inferred that:
An attempt to establish a Tunnel using UDP timed out, so the socket was closed, which leads to the WSACancelBlockingCall os error 10004 on Windows.
As for the reason for the timeout, you can check the following:

  1. 192.168.100.74 has route to 172.19.0.1?
  2. Is udp disabled by firewall?

@lurenpluto
Copy link
Member

From the above ood-installer's log: [2023-04-12 20:15:26.490966 +08:00] INFO [ThreadId(22)] [component\cyfs-bdt\src\tunnel\udp.rs:148] UdpTunnel{local:L4udp192.168.100.74:36242,remote:L4udp172.19.0.1:8050} dead for connecting timeout It can be inferred that: An attempt to establish a Tunnel using UDP timed out, so the socket was closed, which leads to the WSACancelBlockingCall os error 10004 on Windows. As for the reason for the timeout, you can check the following:

  1. 192.168.100.74 has route to 172.19.0.1?
  2. Is udp disabled by firewall?

I think the point of this question is whether this exception is recoverable, maybe network jitter, firewall or anti-virus software and other various reasons can cause this problem occasionally, but if it is recoverable, shouldn't it just quit the whole receiving loop?

If there are conditions, see if we can construct an environment to simulate this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BDT bucky data transfer protocol bug Something isn't working
Projects
Status: 💬To Discuss
Development

No branches or pull requests

4 participants