Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly support *conditional* WebSocket upgrades #473

Open
mslupny opened this issue Aug 18, 2021 · 7 comments
Open

Properly support *conditional* WebSocket upgrades #473

mslupny opened this issue Aug 18, 2021 · 7 comments

Comments

@mslupny
Copy link

mslupny commented Aug 18, 2021

http-kit is non conforming to https://datatracker.ietf.org/doc/html/rfc6455#section-1.3:

The handshake from the server is much simpler than the client
handshake.  The first line is an HTTP Status-Line, with the status
code 101:

HTTP/1.1 101 Switching Protocols

Any status code other than 101 indicates that the WebSocket handshake
has not completed and that the semantics of HTTP still apply.  The
headers follow the status code.

The root cause of the problem in http-kit is that the selector attachment (WsAtta) is
assigned based on HTTP request headers [1], when it should be done only after
making sure the response has status code 101.

It seems to me that fixing the problem in http-kit would require a major architectural change
because as of now WebSocket handshake is sent directly to the socket,
bypassing the callback chain [2], so there's no way of knowing that server
handler returned 101.

Please let me know if I'm missing something and there's a way of fixing it without a major rewrite - I'm willing to work on a patch :).

A simple test that reproduces the problem:

(deftest unsuccessful-ws-conn-upgrade-doesn't-prevent-subsequent-connections
  ;; force using a single TCP connection
  (http/with-connection-pool {:threads 1}
    (let [url "http://localhost:4347/"
          resp (http/get url {:headers {:connection "upgrade"
                                        :upgrade "websocket"}})]

      (testing "Server doesn't respond with 101 Switching Protocols"
        (is (= (:status resp) 200))
        (is (= (:body resp) "hello world")))

      (testing "Subsequent request handled correctly"
        (let [resp (http/get url {:retry-handler (constantly false)})]
          (is (= (:status resp) 200))
          (is (= (:body resp) "hello world")))))))

The second http/get fails with the exception:

ERROR in (unsuccessful-ws-conn-upgrade-doesn't-prevent-subsequent-connections) (DefaultHttpResponseParser.java:141)
Uncaught exception, not in assertion.
expected: nil
  actual: org.apache.http.NoHttpResponseException: localhost:4347 failed to respond
 at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:141)
    org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead (DefaultHttpResponseParser.java:56)
    org.apache.http.impl.io.AbstractMessageParser.parse (AbstractMessageParser.java:259)
    org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader (DefaultBHttpClientConnection.java:163)
    org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader (CPoolProxy.java:157)
...

tcpdump output from the test:

~/dev/3rdparty/http-kit on master *1 !1 > sudo tcpdump -i lo0 tcp port 4347 -nA
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes
14:33:11.607533 IP 127.0.0.1.60186 > 127.0.0.1.4347: Flags [S], seq 3485748767, win 65535, options [mss 16344,nop,wscale 6,nop,nop,TS val 3369682175 ecr 0,sackOK,eol], length 0
E..@..@[email protected]....?........
..D.........
14:33:11.607619 IP 127.0.0.1.4347 > 127.0.0.1.60186: Flags [S.], seq 1402783345, ack 3485748768, win 65535, options [mss 16344,nop,wscale 6,nop,nop,TS val 1652583967 ecr 3369682175,sackOK,eol], length 0
E..@..@[email protected] .....4....?........
b.n...D.....
14:33:11.607627 IP 127.0.0.1.60186 > 127.0.0.1.4347: Flags [.], ack 1, win 6379, options [nop,nop,TS val 3369682175 ecr 1652583967], length 0
E..4..@[email protected] S..r.....(.....
..D.b.n.
14:33:11.607634 IP 127.0.0.1.4347 > 127.0.0.1.60186: Flags [.], ack 1, win 6379, options [nop,nop,TS val 1652583967 ecr 3369682175], length 0
E..4..@[email protected] .....(.....
b.n...D.
14:33:11.609891 IP 127.0.0.1.60186 > 127.0.0.1.4347: Flags [P.], seq 1:166, ack 1, win 6379, options [nop,nop,TS val 3369682177 ecr 1652583967], length 165
E.....@[email protected] S..r...........
..E.b.n.GET / HTTP/1.1
Connection: upgrade
Upgrade: websocket
accept-encoding: gzip, deflate
Host: localhost:4347
User-Agent: Apache-HttpClient/4.5.10 (Java/16.0.1)


14:33:11.609911 IP 127.0.0.1.4347 > 127.0.0.1.60186: Flags [.], ack 166, win 6377, options [nop,nop,TS val 1652583969 ecr 3369682177], length 0
E..4..@[email protected]......(.....
b.n!..E.
14:33:11.650532 IP 127.0.0.1.4347 > 127.0.0.1.60186: Flags [P.], seq 1:146, ack 166, win 6377, options [nop,nop,TS val 1652584008 ecr 3369682177], length 145
E.....@[email protected]............
b.nH..E.HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 11
Server: http-kit
Date: Wed, 18 Aug 2021 12:33:11 GMT

hello world
14:33:11.650569 IP 127.0.0.1.60186 > 127.0.0.1.4347: Flags [.], ack 146, win 6377, options [nop,nop,TS val 3369682216 ecr 1652584008], length 0
E..4..@[email protected]........(.....
..E(b.nH
14:33:11.659389 IP 127.0.0.1.60186 > 127.0.0.1.4347: Flags [P.], seq 166:314, ack 146, win 6377, options [nop,nop,TS val 3369682224 ecr 1652584008], length 148
E.....@[email protected]..............
..E0b.nHGET / HTTP/1.1
accept-encoding: gzip, deflate
Host: localhost:4347
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.5.10 (Java/16.0.1)


14:33:11.659411 IP 127.0.0.1.4347 > 127.0.0.1.60186: Flags [.], ack 314, win 6374, options [nop,nop,TS val 1652584016 ecr 3369682224], length 0
E..4..@[email protected].....(.....
b.nP..E0
14:33:11.661999 IP 127.0.0.1.4347 > 127.0.0.1.60186: Flags [F.], seq 146, ack 314, win 6374, options [nop,nop,TS val 1652584018 ecr 3369682224], length 0
E..4..@[email protected].....(.....
b.nR..E0

As you can see, the server closes the connection right after receiving the second request (because HttpServer.decodeWs is called instead of HttpServer.decodeHttp).

[1] https://github.com/http-kit/http-kit/blob/master/src/java/org/httpkit/server/HttpServer.java#L183
[2] https://github.com/http-kit/http-kit/blob/master/src/java/org/httpkit/server/AsyncChannel.java#L164

@ptaoussanis
Copy link
Member

@mslupny Hi Matt, thanks for bringing attention to this! Apologies for the slow response.

I'm not familiar with the details on this, and unfortunately won't have much time to investigate myself. But one thing I want to check: the linked spec is described as non-normative.

I would generally interpret that as meaning that the section describes recommended or pedagogical behaviour rather than a requirement.

Are you familiar with the possible implications of http-kit not confirming here?

If we're talking about a possible non-trivial architectural change to make/investigate, I think it'd be helpful to be very clear on what the benefits would be (or costs of not doing so).

How did you come across this issue? Have you actually encountered some problem in production?

Thanks!

@mslupny
Copy link
Author

mslupny commented Jun 16, 2022

Unfortunately, we hit this problem in production and noticed it because of errors in nginx logs. Basically you can replace the HTTP client from the test case above with nginx and everything still holds:

  1. a customer makes a request to /ws to a LB
  2. nginx dispatches that request to our backend
  3. we decide to reject that request, e.g. with 401
  4. nginx accepts the response and keeps the TCP connection to be reused when needed
  5. subsequent request (it might come from another customer!) dispatched by nginx on that TCP connection fails

The workaround is to include the header "Connection: close" in the response in step 3, which prevents reusing the TCP connection by nginx.

I understand this is an edge case and not something that justifies rearchitecting the library just to get it fixed, especially since a workaround exists. It might be a good idea to just document the workaround somewhere, and hope the scenario is handled in case http-kit gets a rewrite in the future 😉.

@ptaoussanis
Copy link
Member

Unfortunately, we hit this problem in production and noticed it because of errors in nginx logs. Basically you can replace the HTTP client from the test case above with nginx and everything still holds:

I'm sorry, I'm not sure I'm following. Where does nginx enter into the picture?

Are you suggesting that nginx has the same behaviour as http-kit, and that you actually encountered this problem with nginx?

@mslupny
Copy link
Author

mslupny commented Jun 16, 2022

Not at all. You asked whether this is a hypothetical problem or something encountered in production. We encountered it in production in a setup with nginx used as reverse proxy and a Clojure application with http-kit used as HTTP server. In this architecture, nginx makes HTTP requests to http-kit, and this is how we found the problem.

To be clear: the problem is not specific to nginx. You can reproduce it with any HTTP client that can reuse TCP connections.

Feel free to ping me on Clojurians Slack in case there's still something unclear, might be faster that way.

@ptaoussanis
Copy link
Member

We encountered it in production in a setup with nginx used as reverse proxy and a Clojure application with http-kit used as HTTP server. In this architecture, nginx makes HTTP requests to http-kit, and this is how we found the problem.

Okay, that was helpful - thank you.

So to confirm in some more detail:

  1. We're talking here about Http-kit's behaviour as a server, not as a client.
  2. Client makes WebSocket request to Http-kit.
  3. Http-kit automatically+unconditionally expects to upgrade connection to WebSocket.
  4. But application handler wants to refuse upgrade, and so responds with 401 without as-channel handler (and therefore without the usual 101 handshake response).
  5. Nginx interprets the 401 as a refusal to upgrade the connection to WebSocket protocol, and therefore expects to reuse the connection later as standard HTTP.
  6. However Http-kit continues to regard the connection as upgraded.
  7. Future Nginx requests with that connection fail because of the protocol disagreement.

Is that correct so far?

@mslupny
Copy link
Author

mslupny commented Jun 17, 2022

Spot on! 🙌

@ptaoussanis ptaoussanis changed the title http-kit not compliant with WebSockets RFC Properly support *conditional* WebSocket upgrades Mar 6, 2023
@ptaoussanis
Copy link
Member

@mslupny Thanks Matt! Apologies for the long delay replying.

I've updated the issue title based on my understanding of the above issue summary.

From my superficial understanding, it seems like the core issue may be around point 3.. I.e. it seems that http-kit server may be incorrectly presuming that WebSocket upgrade requests will be accepted, leaving post-rejection connections in a bad state.

Input and/or PRs very welcome 👍

@ptaoussanis ptaoussanis added the bug label Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants