Should tartufo insist on valid base64 encodings? #278
Unanswered
rbailey-godaddy
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Historically, we did a spotty job of this. The original
get_strings_of_set()allowed=to appear anywhere in a "base64" string, although technically it is only permitted as padding at the end. There's a PR in review that replaces this function with a regex-basedfind_string_encodings()that does a better (but still imperfect) job of matching only legal base64 encodings.However, the reason I bring this up is that we have a request to add support for base64url encodings. By way of quick recap, base64url is just like base64 except it replaces
+and/in the encoding alphabet with-and_. I am considering speed-vs-accuracy tradeoffs in possible implementations. I see three options:You do not see "option 4 - make the user tell us with a command line flag" because there is nothing that says a repository -- or even a single file -- might not have both base64 and base64url encodings in it.
Given that we are looking for entropy and not sanity-checking validity, I am very drawn to option 2 due to its efficiency and simplicity, but I am looking for feedback on the real-world consequences of this strategy. There are two corner cases I have thought of:
+(from base64) and_(from base64url). Do we really care if this string is examined for high entropy, even if it is not actually a valid known encoding? History suggests that we do not. (Note previous mishandling of=and lack of valid-length check.)onething-anotherthing. Previously this would be considered as two separate base64 strings (onethingandanotherthing), but now we would see oneonething-anotherthingstring. Previous findings that might be excluded by signature would no longer match, because the detected string had changed. A special case of this would be something like_onethingwhich previously would returnonethingand now would return_onething.As you consider the second point, think about the likelihood of an otherwise-valid base64 encoding appearing in text immediately adjacent to either a
-or_(the two new characters). This seems to me to be unlikely -- but if I thought it was totally safe, I wouldn't be asking.Beta Was this translation helpful? Give feedback.
All reactions