-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selecting from JSON columns does not decode them from UTF-8 under mysql_enable_utf8/mb4 #309
Comments
I use dbdeployer to test with multiple versions of MySQL and MariaDB. The above code for me returns:
This is with MySQL 8.0.13 and perl v5.28.1 |
Which JSON standard? There are at least 5 JSON standards which are with each other incompatible. So everybody who is going to support JSON, should exactly specify which JSON (according to which standard) supports... |
This is the incorrect response. It should be decoded from UTF-8. |
As I mentioned, it is a tricky problem because JSON as a standard implies the UTF-8 encoding. But in the case where the data interchange format also defines a charset, the JSON encoding and decoding should not also encode/decode the charset or you will double encode the data, and the data stored in the binary format will end up being the UTF-8 bytes instead of the correct JSON data. This is how it works already when you pass a JSON string as a parameter: it should not be encoded, since DBD::mysql will encode it to UTF-8 (wrongly, but that is the intention). So on retrieval, it should have been decoded as a text response from UTF-8, and the application then decodes it from JSON without decoding the charset again. As mentioned, the correct behavior is the behavior of DBD::Pg with json and jsonb columns, and in another test it seems to be the case if you select the output of a JSON_OBJECT() function call or similar; but not when selecting a JSON column from MySQL. As for the JSON standard dictating UTF-8, MySQL already adheres to this by the nature of always interpreting strings as utf8mb4 in the JSON context. |
@dveeden We have a small test at jhthorsen/mojo-mysql#57 (comment) followed by some discussion. I'd appreciate to hear your opinion about that. |
from https://metacpan.org/pod/DBD::mysql#mysql_enable_utf8 :
Iis not clear why json data type is different from any other text types. This is obviously a bug in DBD::mysql and needs to be fixed. |
FYI: I have opened a bug against MySQL here, but I still have no way to reproduce or test it myself. https://bugs.mysql.com/bug.php?id=95698 |
Hello folks, I'm stumbling into this bug now. I posted it as a question on Stack Overflow before finding this discussion. My post includes a self-contained unit test that replicates/illustrates the problem: My question is, while waiting for a fix, does anyone have any recommendations or suggestions for how to work around it in a way that is backward-compatible to a possible eventual fix to DBD::Mysql. I wouldn't want to add my own decode after retrieving a JSON column, only to have some later DBD::Mysql also decode. Thanks! |
You need to specify your connection encoding to mysql server. Then in a Perl script, dump these variables so that you can see what's going on:
and these variables as well....
then select a single character and check how it ends up in your application:
Make sure that the character and byte lengths make sense depending on your database and connection encoding and check if the utf8 flag is set when appropriate. That's how you can debug your issues. |
I don't have a new enough MySQL handy, but in a bug report for Mojo::mysql it was indicated with a test case that when selecting from a JSON type column, the output is not UTF-8 decoded (jhthorsen/mojo-mysql#57 (comment)). MySQL documents that when JSON is interacted with as a string it is treated as the utf8mb4 charset so it should be returned that way, the same way it is passed as a utf8mb4 string.
Confusing the issue is DBD::mysql's misuse of the internal encoding of strings, but it's possible to avoid that confusion by ensuring the passed JSON string is
utf8::upgrade
d or constructing the JSON within the query from bytes.The issue should be reproducible by inserting into a JSON type column with a query like so:
then selecting that column. JSON and encoding is a tricky subject because it is commonly represented as UTF-8 encoded bytes rather than text, but as MySQL treats it on input and output as a character string, it should be decoded as the input string is encoded; this also is what happens in DBD::Pg.
The text was updated successfully, but these errors were encountered: