Skip to content

DBD-mysql-4.42.0 encodes binary blobs when storing [rt.cpan.org #121921] #217

Closed
@mbeijen

Description

@mbeijen

Migrated from rt.cpan.org#121921 (status was 'open')

Requestors:

Attachments:

From [email protected] on 2017-05-28 13:22:04:

Hi all

I have an application that, among others, has to store PDF files in a
mysql db. This has been running for almost 10 years now.

After upgrading DBD-mysql to 4.42.0, the PDF files get corrupted when
storing them to the db. It appears that they are somehow encoded in a
character set (presumably utf8), even though the column definition is
"mediumblob".

4.41.0 and earlier versions do not show that behaviour, a downgrade of
DBD::mysql without any other changes restores the correct behaviour.


Here is how the db is connected:

my $dbh = DBI->connect(${dsn},
            ${username},
            ${passwd},
            { RaiseError => 1,
              AutoCommit => 1,
              AutoInactiveDestroy => 1,
              mysql_auto_reconnect => 1,
              mysql_enable_utf8 => 1,
            }
          )
or die("DB connect failed: $DBI::errstr");

The code then goes on to insert data into tables like this one:

+------------+---------------------+------+-----+---------+-------+
| Field      | Type                | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| id         | bigint(20) unsigned | NO   | PRI | NULL    |       |
| name       | text                | NO   |     | NULL    |       |
| data       | mediumblob          | NO   |     | NULL    |       |
| doctype    | text                | NO   |     | NULL    |       |
+------------+---------------------+------+-----+---------+-------+

like this:

my $sql = "INSERT INTO $table (id, name, data, doctype)
  VALUES (?, ?, ?, ?)";
my $sth = $dbh->prepare($sql);

# $data_in is the raw PDF file data
$logger->log("File $name_in has " . length($data_in) . " bytes and hash
" . sha256_hex($data_in));

$sth->execute($id_in, $name_in, $data_in, "application/pdf");

The log entry shows the correct size and hash, identical to what the
file looks like on disk.

May 28 14:54:17 dev middleware[26817]: File testdoc.pdf has 493392 bytes
and hash  403da58f84328365c8bdb646bfa008f31b44f2c391dc5d40eefa6963bc49c991

But after retrieving the blob (either via the app or via CLI), the file
size is much larger (700562), the hash is different, and the file is
corrupt and cannot be opened by any reader.

MariaDB [pdfdb]> select data from $table where id = 48 into dumpfile
"/tmp/48.dump";

# ls -l /tmp/48.dump
-rw-rw-rw- 1 mysql mysql 700562 May 28 14:56 /tmp/48.dump

I've tried re-encoding the file (with vim) to latin1, which results in
the original file size 493392, but still leaves the PDF corrupted.


I assume that this is a bug in DBD::mysql.

System: Gentoo Linux ~amd64, kernel 4.10.5-gentoo
        Perl 5.24.1
        DBD::mysql 4.42.0


Thanks for looking into this.

Markus


From [email protected] on 2017-06-16 13:03:10:

On Ned Máj 28 09:22:04 2017, [email protected] wrote:
> Hi all
> 
> I have an application that, among others, has to store PDF files in a
> mysql db. This has been running for almost 10 years now.
> 
> After upgrading DBD-mysql to 4.42.0, the PDF files get corrupted when
> storing them to the db. It appears that they are somehow encoded in a
> character set (presumably utf8), even though the column definition is
> "mediumblob".
> 
> 4.41.0 and earlier versions do not show that behaviour, a downgrade of
> DBD::mysql without any other changes restores the correct behaviour.
> 
> 
> Here is how the db is connected:
> 
> my $dbh = DBI->connect(${dsn},
>             ${username},
>             ${passwd},
>             { RaiseError => 1,
>               AutoCommit => 1,
>               AutoInactiveDestroy => 1,
>               mysql_auto_reconnect => 1,
>               mysql_enable_utf8 => 1,
>             }
>           )
> or die("DB connect failed: $DBI::errstr");
> 
> The code then goes on to insert data into tables like this one:
> 
> +------------+---------------------+------+-----+---------+-------+
> | Field      | Type                | Null | Key | Default | Extra |
> +------------+---------------------+------+-----+---------+-------+
> | id         | bigint(20) unsigned | NO   | PRI | NULL    |       |
> | name       | text                | NO   |     | NULL    |       |
> | data       | mediumblob          | NO   |     | NULL    |       |
> | doctype    | text                | NO   |     | NULL    |       |
> +------------+---------------------+------+-----+---------+-------+
> 
> like this:
> 
> my $sql = "INSERT INTO $table (id, name, data, doctype)
>   VALUES (?, ?, ?, ?)";
> my $sth = $dbh->prepare($sql);
> 
> # $data_in is the raw PDF file data
> $logger->log("File $name_in has " . length($data_in) . " bytes and hash
> " . sha256_hex($data_in));
> 
> $sth->execute($id_in, $name_in, $data_in, "application/pdf");
> 
> The log entry shows the correct size and hash, identical to what the
> file looks like on disk.
> 
> May 28 14:54:17 dev middleware[26817]: File testdoc.pdf has 493392 bytes
> and hash  403da58f84328365c8bdb646bfa008f31b44f2c391dc5d40eefa6963bc49c991
> 
> But after retrieving the blob (either via the app or via CLI), the file
> size is much larger (700562), the hash is different, and the file is
> corrupt and cannot be opened by any reader.
> 
> MariaDB [pdfdb]> select data from $table where id = 48 into dumpfile
> "/tmp/48.dump";
> 
> # ls -l /tmp/48.dump
> -rw-rw-rw- 1 mysql mysql 700562 May 28 14:56 /tmp/48.dump
> 
> I've tried re-encoding the file (with vim) to latin1, which results in
> the original file size 493392, but still leaves the PDF corrupted.
> 
> 
> I assume that this is a bug in DBD::mysql.
> 
> System: Gentoo Linux ~amd64, kernel 4.10.5-gentoo
>         Perl 5.24.1
>         DBD::mysql 4.42.0
> 
> 
> Thanks for looking into this.
> 
> Markus
> 
> 

Duplicate of:
https://rt.cpan.org/Public/Bug/Display.html?id=120953
https://github.com/perl5-dbi/DBD-mysql/issues/107

See also for more details:
https://rt.cpan.org/Ticket/Display.html?id=25590
https://rt.cpan.org/Ticket/Display.html?id=60987
https://rt.cpan.org/Ticket/Display.html?id=53130
https://rt.cpan.org/Ticket/Display.html?id=87428

Metadata

Metadata

Assignees

No one assigned

    Labels

    utf8Unicode and UTF-8 handling

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions