Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBD-mysql-4.42.0 encodes binary blobs when storing [rt.cpan.org #121921] #217

Open
mbeijen opened this issue Nov 15, 2017 · 2 comments
Open
Labels
utf8 Unicode and UTF-8 handling

Comments

@mbeijen
Copy link
Contributor

mbeijen commented Nov 15, 2017

Migrated from rt.cpan.org#121921 (status was 'open')

Requestors:

Attachments:

From [email protected] on 2017-05-28 13:22:04:

Hi all

I have an application that, among others, has to store PDF files in a
mysql db. This has been running for almost 10 years now.

After upgrading DBD-mysql to 4.42.0, the PDF files get corrupted when
storing them to the db. It appears that they are somehow encoded in a
character set (presumably utf8), even though the column definition is
"mediumblob".

4.41.0 and earlier versions do not show that behaviour, a downgrade of
DBD::mysql without any other changes restores the correct behaviour.


Here is how the db is connected:

my $dbh = DBI->connect(${dsn},
            ${username},
            ${passwd},
            { RaiseError => 1,
              AutoCommit => 1,
              AutoInactiveDestroy => 1,
              mysql_auto_reconnect => 1,
              mysql_enable_utf8 => 1,
            }
          )
or die("DB connect failed: $DBI::errstr");

The code then goes on to insert data into tables like this one:

+------------+---------------------+------+-----+---------+-------+
| Field      | Type                | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| id         | bigint(20) unsigned | NO   | PRI | NULL    |       |
| name       | text                | NO   |     | NULL    |       |
| data       | mediumblob          | NO   |     | NULL    |       |
| doctype    | text                | NO   |     | NULL    |       |
+------------+---------------------+------+-----+---------+-------+

like this:

my $sql = "INSERT INTO $table (id, name, data, doctype)
  VALUES (?, ?, ?, ?)";
my $sth = $dbh->prepare($sql);

# $data_in is the raw PDF file data
$logger->log("File $name_in has " . length($data_in) . " bytes and hash
" . sha256_hex($data_in));

$sth->execute($id_in, $name_in, $data_in, "application/pdf");

The log entry shows the correct size and hash, identical to what the
file looks like on disk.

May 28 14:54:17 dev middleware[26817]: File testdoc.pdf has 493392 bytes
and hash  403da58f84328365c8bdb646bfa008f31b44f2c391dc5d40eefa6963bc49c991

But after retrieving the blob (either via the app or via CLI), the file
size is much larger (700562), the hash is different, and the file is
corrupt and cannot be opened by any reader.

MariaDB [pdfdb]> select data from $table where id = 48 into dumpfile
"/tmp/48.dump";

# ls -l /tmp/48.dump
-rw-rw-rw- 1 mysql mysql 700562 May 28 14:56 /tmp/48.dump

I've tried re-encoding the file (with vim) to latin1, which results in
the original file size 493392, but still leaves the PDF corrupted.


I assume that this is a bug in DBD::mysql.

System: Gentoo Linux ~amd64, kernel 4.10.5-gentoo
        Perl 5.24.1
        DBD::mysql 4.42.0


Thanks for looking into this.

Markus


From [email protected] on 2017-06-16 13:03:10:

On Ned Máj 28 09:22:04 2017, [email protected] wrote:
> Hi all
> 
> I have an application that, among others, has to store PDF files in a
> mysql db. This has been running for almost 10 years now.
> 
> After upgrading DBD-mysql to 4.42.0, the PDF files get corrupted when
> storing them to the db. It appears that they are somehow encoded in a
> character set (presumably utf8), even though the column definition is
> "mediumblob".
> 
> 4.41.0 and earlier versions do not show that behaviour, a downgrade of
> DBD::mysql without any other changes restores the correct behaviour.
> 
> 
> Here is how the db is connected:
> 
> my $dbh = DBI->connect(${dsn},
>             ${username},
>             ${passwd},
>             { RaiseError => 1,
>               AutoCommit => 1,
>               AutoInactiveDestroy => 1,
>               mysql_auto_reconnect => 1,
>               mysql_enable_utf8 => 1,
>             }
>           )
> or die("DB connect failed: $DBI::errstr");
> 
> The code then goes on to insert data into tables like this one:
> 
> +------------+---------------------+------+-----+---------+-------+
> | Field      | Type                | Null | Key | Default | Extra |
> +------------+---------------------+------+-----+---------+-------+
> | id         | bigint(20) unsigned | NO   | PRI | NULL    |       |
> | name       | text                | NO   |     | NULL    |       |
> | data       | mediumblob          | NO   |     | NULL    |       |
> | doctype    | text                | NO   |     | NULL    |       |
> +------------+---------------------+------+-----+---------+-------+
> 
> like this:
> 
> my $sql = "INSERT INTO $table (id, name, data, doctype)
>   VALUES (?, ?, ?, ?)";
> my $sth = $dbh->prepare($sql);
> 
> # $data_in is the raw PDF file data
> $logger->log("File $name_in has " . length($data_in) . " bytes and hash
> " . sha256_hex($data_in));
> 
> $sth->execute($id_in, $name_in, $data_in, "application/pdf");
> 
> The log entry shows the correct size and hash, identical to what the
> file looks like on disk.
> 
> May 28 14:54:17 dev middleware[26817]: File testdoc.pdf has 493392 bytes
> and hash  403da58f84328365c8bdb646bfa008f31b44f2c391dc5d40eefa6963bc49c991
> 
> But after retrieving the blob (either via the app or via CLI), the file
> size is much larger (700562), the hash is different, and the file is
> corrupt and cannot be opened by any reader.
> 
> MariaDB [pdfdb]> select data from $table where id = 48 into dumpfile
> "/tmp/48.dump";
> 
> # ls -l /tmp/48.dump
> -rw-rw-rw- 1 mysql mysql 700562 May 28 14:56 /tmp/48.dump
> 
> I've tried re-encoding the file (with vim) to latin1, which results in
> the original file size 493392, but still leaves the PDF corrupted.
> 
> 
> I assume that this is a bug in DBD::mysql.
> 
> System: Gentoo Linux ~amd64, kernel 4.10.5-gentoo
>         Perl 5.24.1
>         DBD::mysql 4.42.0
> 
> 
> Thanks for looking into this.
> 
> Markus
> 
> 

Duplicate of:
https://rt.cpan.org/Public/Bug/Display.html?id=120953
https://github.com/perl5-dbi/DBD-mysql/issues/107

See also for more details:
https://rt.cpan.org/Ticket/Display.html?id=25590
https://rt.cpan.org/Ticket/Display.html?id=60987
https://rt.cpan.org/Ticket/Display.html?id=53130
https://rt.cpan.org/Ticket/Display.html?id=87428
@markuswernig
Copy link

Hi all

I just wanted to ask what direction you have decided to go with the utf8 support of DBD::mysql.

I see that 4.48.0 is on cpan.
My installed version on Gentoo is now 4.44.0. The old behaviour seems to be still there, I have not yet changed the code, but it still works.

I followed the discussion on https://www.nntp.perl.org/group/perl.dbi.users/2017/09/msg37443.html, but couldn't find out what will be the final course of action.

Can you please shed some light?
Thanks /markus

@pali
Copy link
Member

pali commented Jan 27, 2019

Hi @markuswernig! Based on discussion in #117 this issue will never be fixed in DBD::mysql. Also it is reason why I forked DBD::mysql into DBD::MariaDB https://metacpan.org/pod/DBD::MariaDB where problem with Unicode and binary parameters is fixed.

@dveeden dveeden added the utf8 Unicode and UTF-8 handling label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
utf8 Unicode and UTF-8 handling
Projects
None yet
Development

No branches or pull requests

4 participants