The proxied request path is URI decoded #1575

NorthFury · 2020-10-30T11:17:26Z

Describe the bug
Skipper mutates the request path and decodes the encoded characters.

To Reproduce
Have a stub service on 8082 logging the request path.
Start skipper with the following routes:
test: * -> "http://localhost:8082";
Requests executed:

curl http://localhost:8082/aha/https:%2F%2Fin
curl http://localhost:9090/aha/https:%2F%2Fin

The logged request paths were:

/aha/https:%2F%2Fin
/aha/https:/in

Expected behavior
Keep the encoded characters as they were.
Observed behavior
The encoded characters were decoded and even more, the double slash was normalized to one slash because those characters were no longer encoded.

The text was updated successfully, but these errors were encountered:

szuecs · 2020-10-30T11:24:53Z

The bug was introduced in #1547

As far as I understand in the discussion the issue is the double / is merged into only one /.
The change of logging should not be a problem, right?

rkeytacked · 2020-10-30T11:27:06Z

Actually the bug is a combination of URI-unescaping together with normalization.
Before #1547 already the unescaping was violating the W3C recommendations:

According to https://www.w3.org/Addressing/URL/4_URI_Recommentations.html URL-sensitive characters

[…] may NEVER be encoded and unencoded in this way. […]

Example 2
The URIs
http://www.w3.org/albert/bertram/marie-claude
and
http://www.w3.org/albert/bertram%2Fmarie-claude
are NOT identical, as in the second case the encoded slash does not have hierarchical significance.

Currently [%24,%26,%2B..%3B,%3D,%40..%7A] are being unescaped, i.e. including the / ? & characters

rkeytacked · 2020-10-30T11:27:14Z

The logging is not a problem

szuecs · 2020-10-30T11:29:10Z

@rkeytacked ok thanks

NorthFury · 2020-10-30T11:33:08Z

I tested the output of httppath.Clean and it does not touch the escaped characters. This is in reference to the linked PR.

The issue here is that the escaped characters are not preserved for the request path received by the backend.

aryszka · 2020-10-30T11:42:26Z

@NorthFury do you mean that it's not the linked change that causes the problem? #1547

NorthFury · 2020-10-30T11:49:41Z

@aryszka I just ran my test against #1547 and it is the change that introduced the bug.

AlexanderYastrebov · 2020-10-30T12:35:51Z

/cc @tomlarkworthy

aryszka · 2020-10-30T12:47:02Z

My suggestion is to revert #1547 and so return to earlier behavior, and to support the use case of #1547 , we should create a filter that achieves the same.

@tomlarkworthy would this approach work for you?

rkeytacked · 2020-10-30T12:49:04Z

@aryszka Actually I would feel better not to decode the slash %2F. Then the URL normalization would also not hurt anymore.

rkeytacked · 2020-10-30T12:52:39Z

Actually this very commit wanted to avoid what is happening right now: d79a87f

Somehow the intention is not met.

NorthFury · 2020-10-30T12:59:55Z

The proposal to to the normalization using a filter is the better approach for a proxy. A request in general and more specifically the request path should not be mutated unless explicitly via a filter or a flag.

rkeytacked · 2020-10-30T13:05:19Z

@NorthFury but that also means that decoding %-characters should only happen if wanted, right?

NorthFury · 2020-10-30T13:17:52Z

@rkeytacked yes, any change to the request path should be explicit.

tomlarkworthy · 2020-10-30T13:43:55Z

My original issue is that the regex path filers diverge from what gets sent upstream. So even if you set a filter, you might get something different hit upstream that looks like it should have been filtered out. This is being observed in our services ATM.

Adding an additional explicit normalize filter would satisfy our needs, but I kind of feel by setting a PathFilter I have already opted into a certain shape of upstream request, so I think what would be cleanest is if the PathFilter does the url normalization mutation? So not a different normalization filter, this should be done by the PathFilter? But would that break your use case @NorthFury ?

szuecs · 2020-10-30T15:07:49Z

@tomlarkworthy do you mean the Path predicate, if you write about PathFilter?

NorthFury · 2020-10-30T15:28:02Z

@tomlarkworthy I don't have specific requirements apart for not to decode the reserved characters.
@tomlarkworthy when you mention the PathFilter do you mean the Path predicate? If so, I would say no. Predicates should not mutate requests.

Let me summarize the problem as I understand it and various directions this can go into.
Normalization of the URI is required for both filters and for route matching in order to have the system behave in a predictable manner. From this point of view, I can see how it would be a good idea to have this happen as soon as possible in the request life cycle.

Here are the normalization changes that can be applied:

dot segments: this is implemented by httppath.Clean except it also de-duplicates slashes and forces an empty path to /
- do note that de-duplicating slashes changes the semantic of the path but has been like this in skipper for a very long time
force encoded characters to uppercase: for example %5a to %5A
decoding unreserved characters:
- alpha: %41-%5A and %61-%7A
- digit: %30-%39
- hyphen: %2D
- period: %2E
- underscore: %5F
- tilde: %7E

One thing to note is that for route matching, the match for encoded characters should be case insensitive, for example Path("/%5a") should match both /%5a and /%5A. For the Path predicate this can be ensured by normalizing the predicate argument. PathRegexp is the case where it needs to be explicit what the input would look like, if any normalization is applied beforehand.

One solution for this would be to enforce a redirect to a normalized version of the URL. This can be done via a filter. What this doesn't solve is the fact that the same normalization flow should be applied to the route matching.

Another approach is to normalize the path from the start. The question remains if one should be able to control this behavior via a flag. If the flag is set in order not to normalize the path, this needs to happen only on matching. This particular approach could also benefit from having a filter to enforce a normalized path if required.

It feels like providing the path normalization from the start of the request offers a few distinct advantages.

tomlarkworthy · 2020-10-30T15:58:54Z

OK an explicit opt-in normalization filter seems to be safest choice.

szuecs · 2020-11-13T19:24:13Z

@NorthFury I think we can close this issue and create a new one to add the requested filter, right?

NorthFury added the bugfix Bug fixes and patches label Oct 30, 2020

szuecs mentioned this issue Oct 30, 2020

Revert "Normalizes URLs for upstream" #1576

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The proxied request path is URI decoded #1575

The proxied request path is URI decoded #1575

NorthFury commented Oct 30, 2020

szuecs commented Oct 30, 2020 •

edited

Loading

rkeytacked commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

szuecs commented Oct 30, 2020

NorthFury commented Oct 30, 2020

aryszka commented Oct 30, 2020

NorthFury commented Oct 30, 2020

AlexanderYastrebov commented Oct 30, 2020 •

edited

Loading

aryszka commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

NorthFury commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

NorthFury commented Oct 30, 2020

tomlarkworthy commented Oct 30, 2020 •

edited

Loading

szuecs commented Oct 30, 2020

NorthFury commented Oct 30, 2020 •

edited

Loading

tomlarkworthy commented Oct 30, 2020

szuecs commented Nov 13, 2020

The proxied request path is URI decoded #1575

The proxied request path is URI decoded #1575

Comments

NorthFury commented Oct 30, 2020

szuecs commented Oct 30, 2020 • edited Loading

rkeytacked commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

szuecs commented Oct 30, 2020

NorthFury commented Oct 30, 2020

aryszka commented Oct 30, 2020

NorthFury commented Oct 30, 2020

AlexanderYastrebov commented Oct 30, 2020 • edited Loading

aryszka commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

NorthFury commented Oct 30, 2020

rkeytacked commented Oct 30, 2020

NorthFury commented Oct 30, 2020

tomlarkworthy commented Oct 30, 2020 • edited Loading

szuecs commented Oct 30, 2020

NorthFury commented Oct 30, 2020 • edited Loading

tomlarkworthy commented Oct 30, 2020

szuecs commented Nov 13, 2020

szuecs commented Oct 30, 2020 •

edited

Loading

AlexanderYastrebov commented Oct 30, 2020 •

edited

Loading

tomlarkworthy commented Oct 30, 2020 •

edited

Loading

NorthFury commented Oct 30, 2020 •

edited

Loading