Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Params middleware does not decode some strings correctly #269

Open
iku000888 opened this issue Nov 29, 2016 · 5 comments
Open

Params middleware does not decode some strings correctly #269

iku000888 opened this issue Nov 29, 2016 · 5 comments
Labels

Comments

@iku000888
Copy link

iku000888 commented Nov 29, 2016

Hi James, thanks for the continued work on maintaining ring.

Just want to throw this issue out there in case anyone else stumbles upon the same issue, plus gathering some feedback.

In our project, we were dealing with non utf-8 encoded forms (Shift_JIS to be specific), and found that the params middleware is garbling the incoming string, even though we were using the wrap-params middleware with the Shift_JIS option e.g. (param/wrap-params {:encoding "Shift_JIS"})

Looking at the source code deeper inside, I see that java.net.URLDecoder is being used for parsing, and that java.net.URLDecoder does not decode some non utf-8 strings correctly, while org.apache.commons.codec.net.URLCodec can, illustrated in below snippet.

  ;; "モジバケコワイ", URL encoded with Shif_JIS by the browser
  "%83%82%83W%83o%83P%83R%83%8F%83C"

 (import [java.net URLDecoder])
 (URLDecoder/decode "%83%82%83W%83o%83P%83R%83%8F%83C" "Shift-JIS")
 ;; => "モ�W�o�P�Rワ�C" 

 (import [org.apache.commons.codec.net URLCodec])
 (let [codec (URLCodec. "Shift-JIS")]
   (.decode codec "%83%82%83W%83o%83P%83R%83%8F%83C" "Shift-JIS"))
 ;; => "モジバケコワイ"

We came up with 2 work arounds for this issue:

  1. Use org.apache.commons.codec.net.URLCodec instead of java.net.URLDecoder for decoding URL encoded parameteres
  2. Use a form with an enctype of multipart/form-data so that nothing gets encoded and thus avoid the problem entirely

Would appreciate the if I can get feed back on:

  1. Which of the above workaround is preferable?
  2. Would you be interested in a PR that replaces the decoder used in the params middleware with org.apache.commons.codec.net URLCodec?

Thanks!

@weavejester
Copy link
Member

Thanks for the feedback! It's odd that the Java URLDecoder doesn't work for the Shift-JIS charset.

If the Commons URLCodec works better, and there's no other side effects, then I think we should swap to that in the Ring-Codec library. It already depends on the commons-codec library, so I don't think there should be any disadvantage to it. I've done some quick benchmarks, and commons-codec seems about twice as fast decoding anyway.

@weavejester
Copy link
Member

Looks like it'll take a bit of work to update the Ring-Codec library. As well as the URLDecoder, Ring-Codec has its own custom implementation for percent encoding that suffers from the same issue.

@iku000888
Copy link
Author

Thanks for taking a look at this!
Let me know if there is anything I can help with!

iku000888 added a commit to iku000888/ring-codec that referenced this issue Dec 7, 2016
- This addresses the following URLdecoding issue
ring-clojure/ring#269
iku000888 added a commit to iku000888/ring-codec that referenced this issue Dec 7, 2016
- This addresses the following URLdecoding issue
ring-clojure/ring#269
@iku000888
Copy link
Author

I took a while that the scope of the issue is in ring-codec.
I have created ring-clojure/ring-codec#15
which just replaces the decoder. Maybe it is better to move the discussion over there?

@weavejester weavejester added the bug label Jun 4, 2017
@kumarshantanu
Copy link

kumarshantanu commented Jan 1, 2018

I have faced a similar issue where wrap-params causes the handler to sometimes throw java.io.UnsupportedEncodingException due to bad encoding. It would be great to return HTTP 400 as a response to such requests - I am currently using a workaround on top of wrap-params by applying a middleware to intercept the exception and throw HTTP 400.

Calling (.getMessage ^UnsupportedEncodingException ex) gives the encoding. HTTP 415 may be an alternative status code for this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants