Closed
Description
Migrated from rt.cpan.org#121921 (status was 'open')
Requestors:
Attachments:
From [email protected] on 2017-05-28 13:22:04:
Hi all
I have an application that, among others, has to store PDF files in a
mysql db. This has been running for almost 10 years now.
After upgrading DBD-mysql to 4.42.0, the PDF files get corrupted when
storing them to the db. It appears that they are somehow encoded in a
character set (presumably utf8), even though the column definition is
"mediumblob".
4.41.0 and earlier versions do not show that behaviour, a downgrade of
DBD::mysql without any other changes restores the correct behaviour.
Here is how the db is connected:
my $dbh = DBI->connect(${dsn},
${username},
${passwd},
{ RaiseError => 1,
AutoCommit => 1,
AutoInactiveDestroy => 1,
mysql_auto_reconnect => 1,
mysql_enable_utf8 => 1,
}
)
or die("DB connect failed: $DBI::errstr");
The code then goes on to insert data into tables like this one:
+------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| id | bigint(20) unsigned | NO | PRI | NULL | |
| name | text | NO | | NULL | |
| data | mediumblob | NO | | NULL | |
| doctype | text | NO | | NULL | |
+------------+---------------------+------+-----+---------+-------+
like this:
my $sql = "INSERT INTO $table (id, name, data, doctype)
VALUES (?, ?, ?, ?)";
my $sth = $dbh->prepare($sql);
# $data_in is the raw PDF file data
$logger->log("File $name_in has " . length($data_in) . " bytes and hash
" . sha256_hex($data_in));
$sth->execute($id_in, $name_in, $data_in, "application/pdf");
The log entry shows the correct size and hash, identical to what the
file looks like on disk.
May 28 14:54:17 dev middleware[26817]: File testdoc.pdf has 493392 bytes
and hash 403da58f84328365c8bdb646bfa008f31b44f2c391dc5d40eefa6963bc49c991
But after retrieving the blob (either via the app or via CLI), the file
size is much larger (700562), the hash is different, and the file is
corrupt and cannot be opened by any reader.
MariaDB [pdfdb]> select data from $table where id = 48 into dumpfile
"/tmp/48.dump";
# ls -l /tmp/48.dump
-rw-rw-rw- 1 mysql mysql 700562 May 28 14:56 /tmp/48.dump
I've tried re-encoding the file (with vim) to latin1, which results in
the original file size 493392, but still leaves the PDF corrupted.
I assume that this is a bug in DBD::mysql.
System: Gentoo Linux ~amd64, kernel 4.10.5-gentoo
Perl 5.24.1
DBD::mysql 4.42.0
Thanks for looking into this.
Markus
From [email protected] on 2017-06-16 13:03:10:
On Ned Máj 28 09:22:04 2017, [email protected] wrote:
> Hi all
>
> I have an application that, among others, has to store PDF files in a
> mysql db. This has been running for almost 10 years now.
>
> After upgrading DBD-mysql to 4.42.0, the PDF files get corrupted when
> storing them to the db. It appears that they are somehow encoded in a
> character set (presumably utf8), even though the column definition is
> "mediumblob".
>
> 4.41.0 and earlier versions do not show that behaviour, a downgrade of
> DBD::mysql without any other changes restores the correct behaviour.
>
>
> Here is how the db is connected:
>
> my $dbh = DBI->connect(${dsn},
> ${username},
> ${passwd},
> { RaiseError => 1,
> AutoCommit => 1,
> AutoInactiveDestroy => 1,
> mysql_auto_reconnect => 1,
> mysql_enable_utf8 => 1,
> }
> )
> or die("DB connect failed: $DBI::errstr");
>
> The code then goes on to insert data into tables like this one:
>
> +------------+---------------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +------------+---------------------+------+-----+---------+-------+
> | id | bigint(20) unsigned | NO | PRI | NULL | |
> | name | text | NO | | NULL | |
> | data | mediumblob | NO | | NULL | |
> | doctype | text | NO | | NULL | |
> +------------+---------------------+------+-----+---------+-------+
>
> like this:
>
> my $sql = "INSERT INTO $table (id, name, data, doctype)
> VALUES (?, ?, ?, ?)";
> my $sth = $dbh->prepare($sql);
>
> # $data_in is the raw PDF file data
> $logger->log("File $name_in has " . length($data_in) . " bytes and hash
> " . sha256_hex($data_in));
>
> $sth->execute($id_in, $name_in, $data_in, "application/pdf");
>
> The log entry shows the correct size and hash, identical to what the
> file looks like on disk.
>
> May 28 14:54:17 dev middleware[26817]: File testdoc.pdf has 493392 bytes
> and hash 403da58f84328365c8bdb646bfa008f31b44f2c391dc5d40eefa6963bc49c991
>
> But after retrieving the blob (either via the app or via CLI), the file
> size is much larger (700562), the hash is different, and the file is
> corrupt and cannot be opened by any reader.
>
> MariaDB [pdfdb]> select data from $table where id = 48 into dumpfile
> "/tmp/48.dump";
>
> # ls -l /tmp/48.dump
> -rw-rw-rw- 1 mysql mysql 700562 May 28 14:56 /tmp/48.dump
>
> I've tried re-encoding the file (with vim) to latin1, which results in
> the original file size 493392, but still leaves the PDF corrupted.
>
>
> I assume that this is a bug in DBD::mysql.
>
> System: Gentoo Linux ~amd64, kernel 4.10.5-gentoo
> Perl 5.24.1
> DBD::mysql 4.42.0
>
>
> Thanks for looking into this.
>
> Markus
>
>
Duplicate of:
https://rt.cpan.org/Public/Bug/Display.html?id=120953
https://github.com/perl5-dbi/DBD-mysql/issues/107
See also for more details:
https://rt.cpan.org/Ticket/Display.html?id=25590
https://rt.cpan.org/Ticket/Display.html?id=60987
https://rt.cpan.org/Ticket/Display.html?id=53130
https://rt.cpan.org/Ticket/Display.html?id=87428