Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The proxied request path is URI decoded #1575

Open
NorthFury opened this issue Oct 30, 2020 · 19 comments
Open

The proxied request path is URI decoded #1575

NorthFury opened this issue Oct 30, 2020 · 19 comments
Labels
bugfix Bug fixes and patches

Comments

@NorthFury
Copy link
Member

Describe the bug
Skipper mutates the request path and decodes the encoded characters.

To Reproduce
Have a stub service on 8082 logging the request path.
Start skipper with the following routes:
test: * -> "http://localhost:8082";
Requests executed:

curl http://localhost:8082/aha/https:%2F%2Fin
curl http://localhost:9090/aha/https:%2F%2Fin

The logged request paths were:

/aha/https:%2F%2Fin
/aha/https:/in

Expected behavior
Keep the encoded characters as they were.
Observed behavior
The encoded characters were decoded and even more, the double slash was normalized to one slash because those characters were no longer encoded.

@NorthFury NorthFury added the bugfix Bug fixes and patches label Oct 30, 2020
@szuecs
Copy link
Member

szuecs commented Oct 30, 2020

The bug was introduced in #1547

As far as I understand in the discussion the issue is the double / is merged into only one /.
The change of logging should not be a problem, right?

@rkeytacked
Copy link

Actually the bug is a combination of URI-unescaping together with normalization.
Before #1547 already the unescaping was violating the W3C recommendations:

According to https://www.w3.org/Addressing/URL/4_URI_Recommentations.html URL-sensitive characters

[…] may NEVER be encoded and unencoded in this way. […]

Example 2
The URIs
http://www.w3.org/albert/bertram/marie-claude
and
http://www.w3.org/albert/bertram%2Fmarie-claude
are NOT identical, as in the second case the encoded slash does not have hierarchical significance.

Currently [%24,%26,%2B..%3B,%3D,%40..%7A] are being unescaped, i.e. including the / ? & characters

@rkeytacked
Copy link

The logging is not a problem

@szuecs
Copy link
Member

szuecs commented Oct 30, 2020

@rkeytacked ok thanks

@NorthFury
Copy link
Member Author

I tested the output of httppath.Clean and it does not touch the escaped characters. This is in reference to the linked PR.

The issue here is that the escaped characters are not preserved for the request path received by the backend.

@aryszka
Copy link
Contributor

aryszka commented Oct 30, 2020

@NorthFury do you mean that it's not the linked change that causes the problem? #1547

@NorthFury
Copy link
Member Author

@aryszka I just ran my test against #1547 and it is the change that introduced the bug.

@AlexanderYastrebov
Copy link
Member

AlexanderYastrebov commented Oct 30, 2020

/cc @tomlarkworthy

@aryszka
Copy link
Contributor

aryszka commented Oct 30, 2020

My suggestion is to revert #1547 and so return to earlier behavior, and to support the use case of #1547 , we should create a filter that achieves the same.

@tomlarkworthy would this approach work for you?

@rkeytacked
Copy link

@aryszka Actually I would feel better not to decode the slash %2F. Then the URL normalization would also not hurt anymore.

@rkeytacked
Copy link

Actually this very commit wanted to avoid what is happening right now: d79a87f

Somehow the intention is not met.

@NorthFury
Copy link
Member Author

The proposal to to the normalization using a filter is the better approach for a proxy. A request in general and more specifically the request path should not be mutated unless explicitly via a filter or a flag.

@rkeytacked
Copy link

@NorthFury but that also means that decoding %-characters should only happen if wanted, right?

@NorthFury
Copy link
Member Author

@rkeytacked yes, any change to the request path should be explicit.

@tomlarkworthy
Copy link
Contributor

tomlarkworthy commented Oct 30, 2020

My original issue is that the regex path filers diverge from what gets sent upstream. So even if you set a filter, you might get something different hit upstream that looks like it should have been filtered out. This is being observed in our services ATM.

Adding an additional explicit normalize filter would satisfy our needs, but I kind of feel by setting a PathFilter I have already opted into a certain shape of upstream request, so I think what would be cleanest is if the PathFilter does the url normalization mutation? So not a different normalization filter, this should be done by the PathFilter? But would that break your use case @NorthFury ?

@szuecs
Copy link
Member

szuecs commented Oct 30, 2020

@tomlarkworthy do you mean the Path predicate, if you write about PathFilter?

@NorthFury
Copy link
Member Author

NorthFury commented Oct 30, 2020

@tomlarkworthy I don't have specific requirements apart for not to decode the reserved characters.
@tomlarkworthy when you mention the PathFilter do you mean the Path predicate? If so, I would say no. Predicates should not mutate requests.

Let me summarize the problem as I understand it and various directions this can go into.
Normalization of the URI is required for both filters and for route matching in order to have the system behave in a predictable manner. From this point of view, I can see how it would be a good idea to have this happen as soon as possible in the request life cycle.

Here are the normalization changes that can be applied:

  • dot segments: this is implemented by httppath.Clean except it also de-duplicates slashes and forces an empty path to /
    • do note that de-duplicating slashes changes the semantic of the path but has been like this in skipper for a very long time
  • force encoded characters to uppercase: for example %5a to %5A
  • decoding unreserved characters:
    • alpha: %41-%5A and %61-%7A
    • digit: %30-%39
    • hyphen: %2D
    • period: %2E
    • underscore: %5F
    • tilde: %7E

One thing to note is that for route matching, the match for encoded characters should be case insensitive, for example Path("/%5a") should match both /%5a and /%5A. For the Path predicate this can be ensured by normalizing the predicate argument. PathRegexp is the case where it needs to be explicit what the input would look like, if any normalization is applied beforehand.

One solution for this would be to enforce a redirect to a normalized version of the URL. This can be done via a filter. What this doesn't solve is the fact that the same normalization flow should be applied to the route matching.

Another approach is to normalize the path from the start. The question remains if one should be able to control this behavior via a flag. If the flag is set in order not to normalize the path, this needs to happen only on matching. This particular approach could also benefit from having a filter to enforce a normalized path if required.

It feels like providing the path normalization from the start of the request offers a few distinct advantages.

@tomlarkworthy
Copy link
Contributor

OK an explicit opt-in normalization filter seems to be safest choice.

@szuecs
Copy link
Member

szuecs commented Nov 13, 2020

@NorthFury I think we can close this issue and create a new one to add the requested filter, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Bug fixes and patches
Projects
None yet
Development

No branches or pull requests

6 participants