Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misguided base64 encoding #9

Open
antong opened this issue May 7, 2019 · 9 comments
Open

Misguided base64 encoding #9

antong opened this issue May 7, 2019 · 9 comments

Comments

@antong
Copy link

antong commented May 7, 2019

The encoders and decoders do base64 encoding of the binary data. The blog post says that this is so that only alphanumeric data is passed to the QR encoder. I assume this is because QR has an alphanumeric mode to more efficiently encode "alphanumeric" data. However, base64 encoding doesn't help for this because the QR "alphanumeric" encoding doesn't encode e.g., lower case letters, only 0-9, A-Z, $%*+-./: and space . I suggest to skip the base64 step to reduce the data size and improve performance.

@DonaldTsang
Copy link

But would it work on inputs with arbitrary bytes (e.g. UTF-8) if the changes are made?

@antong
Copy link
Author

antong commented May 14, 2019

Yes, sure. QR codes can contain arbitrary binary data.

@divan
Copy link
Owner

divan commented May 14, 2019

Great input. You're right, I'm not sure if current approach is more efficient that just using QR native binary mode. I was planning to run tests and do the math for that, but this project now is a little bit at the bottom of my backlog.
But good to keep this in mind and return to the issue when I have some time.

@DonaldTsang
Copy link

@antong So there is text mode, which we can use base64 or some other arbitrary base on, or we can use binary mode with UTF-8, which would be good to test on.

@divan take it low and slow, my friend. It is better to have a better product later than a bad one now.

@antong
Copy link
Author

antong commented May 15, 2019

The QR encoder selects the data encoding mode automatically, and will do quite a good job selecting binary, text or numeric depending on the bytes being encoded. If you first Base-64 encode the data you will immediately have 33% overhead, and on top of that the QR coder can probably no longer select any other QR encoding than binary. So if the original data had sequences of numeric or alphanumeric bytes, then the data could have been even more efficiently encoded without base-64 encoding.

Example encoding different 12 byte (96bit) messages (https://play.golang.org/p/BC2892EZC9B):

  • Worst case without first encoding as Base-64 will make the QR encoding segment 108 bits
  • If the message is numeric, the QR encoding is only 54 bits.
  • If the message is alphanumeric, the QR encoding is 79 bits.
  • First encoding any 12 byte message as Base-64 will make the QR encoding 140 bits (33% worse than worst case without base 64 encoding).

@DonaldTsang
Copy link

DonaldTsang commented May 16, 2019

@antong but I think QR code can encode in bytes as well, which is better than base64 then alphanumeric. Also let us assume that the input is UTF-8 (or arbitrary data) and not only numbers or ASCII.

@antong
Copy link
Author

antong commented May 16, 2019

QR can encode arbitrary bytes yes, that is exactly my point. Base64 encoded bytes can not normally be encoded using the QR alphanumeric encoding, because QR alphanumeric is uppercase only. So base64 encoding just adds 33% overhead and then practically forces the QR encoding to use "byte" encoding for the data. My point is that it is always better to not base64 encode and let the QR encoding either directly use "byte" encoding or optimize by using more efficient alphanumeric or numeric encoding if possible.

@DonaldTsang
Copy link

QR byte encoding it is

@xulihang
Copy link

Base64 will increase the size by 33%. I've implemented a simple animated QR codes reading web app by directly using the bytes: https://github.com/xulihang/AnimatedQRCodeReader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants