Fix SanitizeJson when string contains escaped quotes (or nested json) #173
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SanitizeJson
is currently broken for strings containing nested quoted strings (or nested json). The code attempts to check if characters are already escaped, and not escape them again, which means that the nested strings are not properly decoded and break a json lexer in a subsequent log pipeline.This PR fixes the encoding to be unconditional, specifically:
<cr>
will become\r
,<lf>
will become\n
,"
will become\"
, and\
will become\\
.A unit test is added which fails against the repo as-is, and is also fixed in this PR.
Example
Assume the json
{"foo":"bar","nested_quotes":"this string \"contains\" quotes"}
is logged from an application as a single line of text.The current implementation of
SanitizeJson
returns:GitHub's syntax highlighting shows the issue above clearly: the nested quotes are missing an extra
\\
and therefore accidentally terminate the enclosing string.After this change:
"{\"foo\":\"bar\",\"nested_quotes\":\"this string \\\"contains\\\" quotes\"}"
👌❤️