-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of errors while sending TCP responses. #309
Conversation
…ting to the client, such as the client disconnected. FIX: Stop processing a response stream if there is no space left in the response queue.
…received and checks that the aborted write counter increases.
I’m not sure if considering non-fatal errors with TCP is necessary. Both Retrying when the configured Looking up |
bbd2d14
to
d4deae7
Compare
src/net/server/connection.rs
Outdated
/// | ||
/// The value has to be between 0 and 255 with a default of 2. A value of | ||
/// zero means that one try will be attempted. | ||
const RESPONSE_WRITE_RETRIES: DefMinMax<u8> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. What kind of error do we expect that should be retried. Is there something that is broken in Tokio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I'm wondering myself now. I pulled this out of the xfr
branch where I thought it was added to do with problems I experienced in manual testing of an AXFR transfer to a client. In principle Tokio can return an I/O error on writing to the response stream, and this is about retrying in the case that that error is an interrupted or timeout error, but what would cause it that would be resolvable by retrying I don't know. The code was naively added in the thinking that a timeout is a problem that can be overcome by retrying to work around temporary problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the response write retries config setting. The PR still has a useful additional test that I think is worth keeping.
I think we should a clear description what happens and how to reproduce. I don't think a tcp write is supposed to fail. So we have to know why it fails. |
Perhaps the response write timeout code should be removed, I don't know. It wasn't added in this PR however, so for now I leave that to think about, I haven't removed it. |
I need to review this PR as I want to be sure that I didn't just accidentally throw out this fix:
Update: Yes that was accidentally removed. Naively adding it back in helps when CTRL-C'ing a large dig AXFR transfer preventing a large number of |
Otherwise we keep trying to send to a gone client if we have more responses to send, such as for a large XFR transfer.
…s shutting down or the response queue is full.
The connection is now terminated on error, and the logs of an aborted large XFR transfer look much better:
|
…meaning that the message header was wrongly perceived as having the QR bit set so requests were rejected as replies, never being handled and thus never having responses sent, which couldn't fail because the connection was closed... hence breaking the test.
This PR contains a couple of fixes for TCP connection error handling: