Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add ability to not interpret file substitution chars with arg instead of only escaping. #177

Open
PicoMitchell opened this issue May 4, 2022 · 18 comments

Comments

@PicoMitchell
Copy link

If some values are coming from user input, it is possible (however unlikely) that someone could input a literal string that start with @, %, or :. If this were to happen, jo would try to interpret those values as files to read and would fail to create JSON if the file doesn't exist (or create unintended JSON if the file does exist).

Currently, the only way to avoid this is to manually escape those characters with a \. But, if these values are coming from user input having to do any pre-processing and escaping on those values before passing them to jo kind defeats some of the great value of jo being able to make valid JSON without needing to manually escape things like double quotes within values.

To avoid this possible issue when values should be treated literally no matter what without anything needing to be manually escaped within the value, I'm thinking there could be a new coercion-style argument and/or a global arg to not interpret those special jo characters. I could see value in both a global option and a per value option depending on the scenario. I'm thinking something like -l (for "literal") could be good candidates for this kind of option.

Hopefully, this could be combined with existing coercion args so that something like jo -- -ls "key=${user_input_value}" could be done to always interpret user_input_value as a "literal string" without needing to escape any special characters in the string.

Thanks so much for considering this as well as for the amazingly useful jo!

@PicoMitchell
Copy link
Author

Or, could it make sense to just not interpret those special characters when -s coercion alone is specified? I could see that making sense for not interpreting % and : since those resulting base64 and JSON values would never need or want type coercion, but I could see someone wanting to read the contents of a file with @ AND coercing those contents into a string.

@PicoMitchell PicoMitchell changed the title Request: Add ability to ignore file substitutions chars with arg instead of only escaping. Request: Add ability to not interpret file substitutions chars with arg instead of only escaping. May 4, 2022
@PicoMitchell PicoMitchell changed the title Request: Add ability to not interpret file substitutions chars with arg instead of only escaping. Request: Add ability to not interpret file substitution chars with arg instead of only escaping. May 4, 2022
@jpmens
Copy link
Owner

jpmens commented May 4, 2022

Unchecked user input? Really? ;) I'm not sure jo is good for that ...

@PicoMitchell
Copy link
Author

PicoMitchell commented May 4, 2022

I was considering that passing the user input data through jo to turn it into valid JSON would itself be the act of checking/sanitizing it.

Once the values are within valid JSON and passed to something that only accepts validated JSON, there would likely be no way for malicious user input to be interpreted in any other way than the intended raw values when that valid JSON is properly interpreted later.

@PicoMitchell
Copy link
Author

I think this kind of usage to take user input, encapsulate it in JSON, and then pass to curl --data is not an uncommon pattern: https://unix.stackexchange.com/a/339892

Another scenario could be a password needing to be specified within JSON data to be passed to curl for authentication of an endpoint. It is not unlikely that a password could start with any of those special characters and it may not be expected to need to manually escape/modify that password before being passed to jo: https://unix.stackexchange.com/a/656692

These are just a couple quick examples I've found, but I'm sure there are more out in the wild.

@jpmens
Copy link
Owner

jpmens commented May 5, 2022

could be a password needing to be specified within JSON data to be passed to curl for authentication of an endpoint.

You have convinced me with that example.

but I could see someone wanting to read the contents of a file with @ AND coercing those contents into a string

Indeed; I think we shouldn't mess with that.

That would leave us with a new option to be implemented, something like -N for "No files".

@gromgit what do you think?

@gromgit
Copy link
Collaborator

gromgit commented May 5, 2022

Or, could it make sense to just not interpret those special characters when -s coercion alone is specified?

I think that would be the most expedient and easily-understood solution, but I should sleep on it first, in case there are corner cases that haven't come to mind.

I could see that making sense for not interpreting % and : since those resulting base64 and JSON values would never need or want type coercion, but I could see someone wanting to read the contents of a file with @ AND coercing those contents into a string.

That's already the case: =@ is documented as substituting file contents as-is, and the only logical "container" for that is a string:

$ echo 12345 > test.txt

$ jo [email protected]
{"a":"12345"}

@PicoMitchell
Copy link
Author

PicoMitchell commented May 5, 2022

That's already the case: =@ is documented as substituting file contents as-is, and the only logical "container" for that is a string

Ah, very cool. I should have tested my assumptions on that!

Making -s never interpret special jo chars would be great for my use case! And, I hope that change wouldn't break anybody else's existing code. It seems like a relatively safe assumption that others that are using -s to coerce values are using it with the intention that they want that exact string value set and nothing else.

In fact, after considering that this issue could happen and before posted the request, I made sure that -s didn't already behave like that since it seemed like a reasonable assumption that -s might already coerce the value into a literal string even if it starts with a special jo char.

@gromgit
Copy link
Collaborator

gromgit commented May 5, 2022

Yeah, I'm actually thinking that the current behavior is a bug, and bugs need fixing. 😀

gromgit added a commit to gromgit/jo that referenced this issue May 5, 2022
Current implementation turns `-s a=null` into `{"a":""}` instead of `{"a":"null"}`,
and interprets file inclusion metacharacters at the beginning of values, e.g.
`-s a=@file` returns `{"a":"<contents of file>"}`, but users would generally expect
`{"a":"@file"}` instead.

Fixes jpmens#177.
@gromgit
Copy link
Collaborator

gromgit commented May 5, 2022

While I'm fixing this string coercion bug, it would be good to address the behavior of the other coercion options. Specifically, if file contains 12345, it's clear now that -s a=@file should generate {"a":"@file"}, but should -n and -b also refuse to interpret metacharacters? If so, then:

  • -n a=@file should generate {"a":5} instead of the current {"a":"12345"}
  • -b a=@file should generate {"a":true} instead of the current {"a":"12345"}

@jpmens, your thoughts?

@jpmens
Copy link
Owner

jpmens commented May 5, 2022

@gromgit why would -n a=@file (with file containing 12345) generate {"a": 5} and not {"a": 12345} seeing that 12345 is a valid number?

Today we have:

$ jo -n a=12345
{"a":12345}

$ cat /tmp/file
12345

$ jo -n a=@/tmp/file
{"a":"12345"}

@PicoMitchell
Copy link
Author

PicoMitchell commented May 5, 2022

I hesitate to bring this up because it would break existing functionality, but I'm wondering if it might make more sense for this jo file metacharacter functionality to actually be implemented as coercion arguments themselves (and not have any special metacharacters).

-c could be for contents (equivalent to @), -E could be for base64 encoding (equivalent to %), and -j could be for JSON (equivalent to :). Or, any arguments that you prefer.

For file contents, it seems like it would be intuitive to the user for all the regular type casting to take place just the same as if those exact content were specified on the command line. (It does seem odd that currently jo -n a=@/tmp/file behaves differently than jo -n a="$(cat /tmp/file)")

But also, the -c option could be combined with existing coercion args to be able to do -ci to get contents as an int or -cs for contents as a string, etc. For -E and -j, those would take precedence and any other coercion args would be ignored.

This way, there would be no ambiguity or conflict when only -s, -i, or -b are specified even if the value happens to be a path and there would be no special metacharacters in any value that could change any of the normal behavior.

I think it may also add some level of security in that jo would never access the filesystem unless explicitly instructed to do so with an argument. I feel it's much less easy to include an argument by accident or ignorance vs having a special metacharacter that someone may not be aware of the significance of in their values.

@PicoMitchell
Copy link
Author

PicoMitchell commented May 5, 2022

In fact, if @ (or a theoretical -c) did the standard type casting, that would mean that : (or a theoretical -j) would not actually be necessary at all.

A file containing JSON contents would always be interpreted as JSON unless -s coercion was specified (assuming in this scenario that the combination of @ and -s would still read the file contents vs interpreting the string literally).

@PicoMitchell
Copy link
Author

Clearly, this is actually a bit more of a complex issue that I first considered. This discussion has just got me looking at the big picture now. I don't mean to muddy the waters with the idea of replacing the metacharacters with new coercion args, and I totally understand if that's out of the scope of what you want to consider doing right now (or ever) since it would break existing functionality.

@gromgit
Copy link
Collaborator

gromgit commented May 6, 2022

@gromgit why would -n a=@file (with file containing 12345) generate {"a": 5} and not {"a": 12345} seeing that 12345 is a valid number?

I was thinking that only having -s treat values as literals, while -n and -b continuing to parse values for metacharacters, can be confusing. In my mind, coercion operations should be consistent in dealing with their values.

But that was before I went to bed. Now, I'm wondering what I was thinking. 😀

That said, if file inclusion is still intended to work with -n and -b, then I think coercion should be done on the file contents, i.e. this:

$ jo -n a=@/tmp/numfile
{"a":"12345"}
$ jo -b a=@/tmp/boolfile
{"a":"true"}

is wrong, and should work like this instead:

$ jo -n a=@/tmp/numfile
{"a":12345}
$ jo -b a=@/tmp/boolfile
{"a":true}

@gromgit
Copy link
Collaborator

gromgit commented May 6, 2022

To summarize, current behavior is to include file as a string, regardless of coercion flags.

And here's how I think it should work:

  1. No coercion flags: Include file as string
  2. -s: Treat value as literal, do not include file
  3. -n: Include file, then parse included value as number
  4. -b: Include file, then parse included value as bool

I can't think of any other use case that makes sense.

@gromgit
Copy link
Collaborator

gromgit commented May 7, 2022

Sorry, just realized I'm over-complicating things...

Currently, the only way to avoid this is to manually escape those characters with a \. But, if these values are coming from user input having to do any pre-processing and escaping on those values before passing them to jo kind defeats some of the great value of jo being able to make valid JSON without needing to manually escape things like double quotes within values.

There's no need to preprocess or escape the raw values themselves. Use bash's quoting mechanisms to your advantage:

  • Grab a password from a variable: jo -- -s password=\\"$PW"
  • Get it with a utility: jo -- -s "password=\\$(askpass)"
  • Get it from a file: jo -- -s password=@/tmp/passwd.txt
  • Get it from a file mentioned in a variable: PWFILE=@/tmp/passwd.txt; jo -- -s password="${PWFILE}"

What is currently broken is the orthogonality between file inclusion and coercion, which prevents stuff like this from working:

jo -- -s b64_content=%/tmp/test.txt -n b64_len=%/tmp/test.txt

because file inclusion currently short-circuits before coercions can be applied.

I propose to fix it by first canonicalizing the value:

if value[0] == '\':
    value = value[1:end]
elif value starts with metacharacter:
    value = process_metacharacter(value)
else:
    don't mess with value

then applying any coercions on the canonical value. In my mind, that makes things consistent and minimizes unpleasant surprises.

@PicoMitchell
Copy link
Author

PicoMitchell commented May 7, 2022

So if the first char is a \ then all values will just be interpreted literally whether or not the first char is a metachar and the leading \ will never be displayed? I didn't consider just always escaping the first char since I though that could mess things up when a metachar was not the first char, but that seems to not be the case. Or is there a scenario that this would not work as expected?

With some very quick tests, this behavior seems to work properly in the current version:

% jo -s test=\\"hmmm"
{"test":"hmmm"}
% jo -s test=\\"\hmmm"
{"test":"\\hmmm"}
% jo -s test=\\"@hmmm"
{"test":"@hmmm"}

BUT, this does reveal that \ is also a special char that I didn't consider before that I would think ideally should be interpreted literally by jo without any extra escaping needed under normal circumstances:

% jo -s test="\hmmm"
{"test":"hmmm"}

In this case, I would have expected the output to be {"test":"\\hmmm"}

And the fact that these special characters are only special when they are the first char makes jo's behavior less consistent and clear:

% jo test='oh\hmmm'
{"test":"oh\\hmmm"}

Having to escape special chars in values at all feels cryptic and unintuitive to me for a tool like this and somewhat detracts from the ease of being able to trust that jo will easily turn any value into valid JSON without a lot of hassle of escaping and considering special JSON chars. That being said, using jo in it's current form is still is much easier than trying to manually escape any shell variable to be a valid JSON value.

Still, after sitting with this for a while it feels like metachars are not the most intuitive way to implement file reading features for consistent and reliable behavior in all scenarios. I think using arguments instead would fit that need much better with what kind of behavior is generally expected from a CLI tool.

Said in another way, having metachars in values turn some possible literal values into kind of "commands". Having a value be interpreted literally or as a command without any way to disable/enable that behavior with an argument turns jo values into a tiny interpreted language in and of themselves. To reiterate, having this functionality implemented in this way loses what I considered to be the great value of jo, that any specified value will be directly placed into valid JSON as a literal value.

But, again, I understand if those kinds of large breaking changes are out of scope of what you want to do.

Regardless, thank you for showing this solution that can get the job done properly!

@PicoMitchell
Copy link
Author

PicoMitchell commented May 7, 2022

At the very least, this does bring me back to thinking that having a -l (for "literal") coercion argument which essentially acts identically to always including \ as the first char in a value would be a more obvious and intuitive solution for a command line tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants