Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protocol for handling Request & Response #372

Open
ikitommi opened this issue Jul 6, 2019 · 7 comments
Open

Add protocol for handling Request & Response #372

ikitommi opened this issue Jul 6, 2019 · 7 comments

Comments

@ikitommi
Copy link
Contributor

ikitommi commented Jul 6, 2019

Many modern Java Web Server (Netty, Undertow, Vert.x) have their own abstractions for Requests and Responses. Current ring-spec states that requests and responses as map-like structures. While this is awesome in abstraction-wise (and the killer feature of Ring), performance-wise it can be bad as data needs to be copied into maps. Copying can be eager (like with the jetty-adapter) for all requests or lazy (like Aleph & Immutant) / on demand, but still things like accessing a single header value forced all headers to be read copied and keys lowercased.

Abstracting the request and response as protocols would allow best of both worlds. A protocol like:

(defprotocol RingRequest
  (get-server-port [this])
  (get-server-name [this])
  (get-remote-addr [this])
  (get-uri [this])
  (get-query-string [this])
  (get-scheme [this])
  (get-request-method [this])
  (get-protocol [this])
  (get-headers [this])
  (get-header [this header])
  (get-body [this])
  (get-context [this]))

could be implemented for maps so that they do the normal lookup, but the Aleph-style zero-copy-request (map-like) classes could implement those directly. (get-header request "Content-Type") would use the already parsed information without any copying.

This would allow low-level libraries to be programmed against effective protocols while keeping everything compatible with maps.

Currently working on a new lightweight and zero-fat new wrapper for Undertow and using a custom request protocol (like the one above) with it, but would not like to tie our other libraries to a server-specific protocols.

All the current web servers having their own request & response (map-like) types would need to conform to the new protocols, so would be a breaking change.

@weavejester
Copy link
Member

Could you explain a little more how you envision this working in practice?

Are you saying that the request passed to the handler would both imitate a map and implement this protocol, and the protocol would be for performant access? What about middleware that adds or removes keys?

@ikitommi
Copy link
Contributor Author

ikitommi commented Jul 7, 2019

Yes, implement both a map and the protocol, example here currently using Potemkin while trying to figure out how to best implement that.

Why not def-derived-map:

It is fast to create but slow to use: access functions first check if there is a overridden value, if not, run the code to extract, e.g. double lookup + re-calculating headers per each invocation. Should be memoized at least.

Why not eagerly creating the ring-request, once?

Copying all the headers from Undertow into ring specific format is around 500-2000ns on my Macbook, getting one header via the protocol is around 20ns. Also, eagerly resolving :server-port and :remote-addr are both slow on Undertow, around 70-100ns each and hardly needed on normal request handling.

modifying request

  • adding keys is easy
  • removing or redefining keys: good point, would need a fast bookkeeping on wether to dispatch to the original implementation or to the overridden value. Some code generation like defrecord does currently to avoid the costly lookups on maps. Will try to cook up something.

@ikitommi
Copy link
Contributor Author

ikitommi commented Sep 30, 2019

Did a spike on a PartialCopyRequest in pohjavirta, looks really good: I have a custom Map-like new Type, that has two fields: a Record that contains the fields that are always copied from java-land (:uri and :request-method currently) and a lazy Map-like thing (currently a derived-map from Potemkin that is used if the first one doesn't contain the given key. All modifications are done into the first, which will mask the values in the second. Remove removes from both. As it's a new type, it could implement the ring.protocols/Request protocol, pointing to an exact field in the Record, making it really fast.

(defrecord Request [uri request-method])

(defn create 
  "Creates a partial-copy request where the commonly needed
  keys are copied to an internal [[Request]] Record, while
  rest of the keys are handled via [[ZeroCopyRequest]]."
  [^HttpServerExchange exchange]
  (->PartialCopyRequest
    ;; eager copy
    (->Request
      (.getRequestURI exchange)
      (-> exchange .getRequestMethod .toString .toLowerCase keyword))
    ;; realize on access
    (->ZeroCopyRequest exchange)))

reading the :uri:

(def request (create ...))

;; 12ns (clojure machinery)
(cc/quick-bench
  (:uri request))

;; 5ns (via protocol)
(cc/quick-bench
  (ring/get-uri request))

... with some extraction, the PartialCopyRequest could be done via a macro, so the whole machinery would be easy to use (by the ring adapter maintainers), would be standalone and could be hosted somewhere in.... ring repo?

@weavejester
Copy link
Member

Before this is considered, I'd like to have some good metrics on whether this would significantly improve performance. Given that we'd only be saving 7ns for a key lookup, and there is likely only going to be a few key lookups per request, any performance improvement seems like it would be relatively minor.

@ikitommi
Copy link
Contributor Author

ikitommi commented Oct 1, 2019

I'll do a comprehensive benchmark. The real win is in the reading single header via a protocol (get-header [this header]) instead of copying and re-casing those just if someone needs those. Not useful if one want's to log the request thou. But, let the numbers tell. Cheers.

@ikitommi
Copy link
Contributor Author

ikitommi commented Mar 3, 2020

Haven't had time to do any comprehensive stuff, but quick study on the current jetty "copy-all" model using Criterium on my macbook:

REQ
; => #object[org.eclipse.jetty.server.Request 0x5cb44555 "Request(POST //localhost:8002/)@5cb44555"]

(build-request-map REQ)
;{:ssl-client-cert nil,
; :protocol "HTTP/1.1",
; :remote-addr "0:0:0:0:0:0:0:1",
; :headers {"accept" "application/json, */*",
;           "user-agent" "HTTPie/2.0.0",
;           "connection" "keep-alive",
;           "host" "localhost:8002",
;           "accept-encoding" "gzip, deflate",
;           "content-length" "19",
;           "content-type" "application/json"},
; :server-port 8002,
; :content-length 19,
; :content-type "application/json",
; :character-encoding "UTF-8",
; :uri "/",
; :server-name "localhost",
; :query-string nil,
; :body #object[org.eclipse.jetty.server.HttpInputOverHTTP
;               0x7905558d
;               "HttpInputOverHTTP@7905558d[c=0,q=0,[0]=null,s=STREAM]"],
; :scheme :http,
; :request-method :post}

;; 2.4 µs
(cc/quick-bench
  (build-request-map REQ))

Same as a flamegraph:

Screenshot 2020-03-03 at 11 04 53

... copying the headers takes 55% of the total time. From the clojure map, getting a single header using get-in:

;; 113ns
(cc/quick-bench
  (get-in req [:headers "content-type"]))

compared to a zero-copy solution:

;; a type that implements all needed map methods + ring/Request
;; all lookups (e.g. `get`) are done first to the internal lookup-map + all changes are written there (a full copy is returned on `assoc`,  `dissoc` etc.
;; secondary, lookup is done directly to the underlaying Java Request, potentially cached
;; ... looks like a map, at max 2 lookups to find a value, but zero-copying on creation
(deftype ZeroCopyRequest [_data ^Request _request]
  ILookup
  (valAt [_ k]
    ; ... fast dispatch on known keys, normal lookup for _data for others
  IPersistentMap
  (assoc [_ k v]
     ; .... copy with modified _data
 ...
  ring/RingRequest
  (get-header [_ header]
    (or (some-> _data :headers (get (str/lower-case header)))
        (str/join "," (enumeration-seq (.getHeaders _request header))))))

;; a zero-copy request
(def zreq (->ZeroCopyRequest {}, REQ));

;; 132ns
(cc/quick-bench
  (ring/get-header zreq "content-type"))

the get-header is ignore-case, so (ring/get-header REQ "ContenT-tYPe") works too (implemented like this on the Java web servers already, efficiently)

I think

  • protocols allow orders of magnitude faster mapping when only some things are needed, as there is no work in copying all the stuff forehand
  • adding a Request (& Response) protocols to ring would allow writing really performant components like middleware (just peeking in into a single header like "content-type" via the protocols) - using Zero or PartialCopyRequests
  • flushing the whole request into log would remove all the benefits of protocols, but can't help with that

As there is Ring2 comping up, this could complement it? I could start/help to do a real jetty-zero-copy version if this is something that could go into ring.

@weavejester
Copy link
Member

I've been recently reconsidering this, as it could also be used for compatibility. You've provided an example of a possible request protocol, but did you have any thoughts on the response protocol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants