Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning when using wide characters [rt.cpan.org #25436] #28

Open
toddr opened this issue May 11, 2017 · 0 comments
Open

Warning when using wide characters [rt.cpan.org #25436] #28

toddr opened this issue May 11, 2017 · 0 comments

Comments

@toddr
Copy link
Member

toddr commented May 11, 2017

Migrated from rt.cpan.org#25436 (status was 'open')

Requestors:

Attachments:

From [email protected] on 2007-03-14 17:00:48:

The following script:

#!/usr/bin/perl
use strict;
use warnings;
use YAML::Syck qw();
$YAML::Syck::ImplicitUnicode = 1;
YAML::Syck::DumpFile("/tmp/yaml.yml", "\x{20ac}");

would cause the warning:

Wide character in print at
/usr/perl5.8.7/lib/site_perl/5.8.7/i686-linux/YAML/Syck.pm line 51.

I guess that something like binmode(":utf8") in DumpFile would fix the
warning.

Regards,
    Slaven

From [email protected] on 2008-06-30 15:44:44:

On Wed Mar 14 13:00:48 2007, SREZIC wrote:
> The following script:
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> use YAML::Syck qw();
> $YAML::Syck::ImplicitUnicode = 1;
> YAML::Syck::DumpFile("/tmp/yaml.yml", "\x{20ac}");
> 
> would cause the warning:
> 
> Wide character in print at
> /usr/perl5.8.7/lib/site_perl/5.8.7/i686-linux/YAML/Syck.pm line 51.
> 
> I guess that something like binmode(":utf8") in DumpFile would fix the
> warning.
> 

The issue is still in 1.05.

Thinking again about it, it seems that the binmode call must not be
unconditional, but only used if $ImplicitUnicode is set.

Also it is not clear what to do in the case if DumpFile operates on an
open filehandle. Let the user set binmode on the filehandle? Push the
utf8 layer before writing/reading and pop it after?

Regards,
    Slaven

From [email protected] on 2010-07-19 19:43:08:

I don't know enough about the YAML spec to know if non-utf8 wide chars are supported. An 
easy solution that would not round trip well would be this patch

From [email protected] on 2010-07-20 04:11:55:

I spoke with Avar about this. The plan is to update the documentation to clarify that if you are 
expected to open the file handle as UTF8 if you expect wide chars to be in the structure:

open(my $fh, ">:encoding(UTF-8)", "out.yml") or die
DumpFile($fh, $hashref);

From [email protected] on 2010-08-29 14:56:13:

On 2010-07-20 00:11:55, TODDR wrote:
> I spoke with Avar about this. The plan is to update the documentation
> to clarify that if you are
> expected to open the file handle as UTF8 if you expect wide chars to
> be in the structure:
> 
> open(my $fh, ">:encoding(UTF-8)", "out.yml") or die
> DumpFile($fh, $hashref);
> 

Sorry, I have to re-open this ticket. Using this is not enough to get a
dump/load roundtrip working (see below).

Also, I don't like it that the user has to do something special to have
wide character serialization correct. I think there should be a way to
detect the presence of wide characters automatically and do the right thing?

Regards,
    Slaven


#!/usr/bin/perl -w

use strict;
use Test::More 'no_plan';
use YAML::Syck qw(DumpFile LoadFile);

my $test = ["\x{20ac}"];
open(my $fh, ">:encoding(UTF-8)", "/tmp/test.yml");
DumpFile $fh, $test;
close $fh or die $!;
my $test2 = LoadFile "/tmp/test.yml";
is_deeply($test2,$test);

__END__

$ perl5.12.0 /tmp/yamlsyck.pl
not ok 1
#   Failed test at /tmp/yamlsyck.pl line 12.
Wide character in print at /usr/perl5.12.0/lib/5.12.0/Test/Builder.pm
line 1753.
#     Structures begin differing at:
#          $got->[0] = '��¢����¬'
#     $expected->[0] = 'â¬'
1..1
# Looks like you failed 1 test of 1.
Exitcode 1

From [email protected] on 2010-08-30 17:13:12:

> Also, I don't like it that the user has to do something special to have
> wide character serialization correct. I think there should be a way to
> detect the presence of wide characters automatically and do the right thing?

As an english speaker, my wide character ignorance is vast. I'm open to suggestions but the little 
I know is that auto-detection algorithms for UTF8 are buggy at best.

What do you suggest?

From [email protected] on 2013-04-14 17:32:55:

On 2010-08-30 13:13:12, TODDR wrote:
> > Also, I don't like it that the user has to do something special to
> have
> > wide character serialization correct. I think there should be a way
> to
> > detect the presence of wide characters automatically and do the
> right thing?
> 
> As an english speaker, my wide character ignorance is vast. I'm open
> to suggestions but the little
> I know is that auto-detection algorithms for UTF8 are buggy at best.
> 
> What do you suggest?

I had a very brief look into the source code of YAML::Syck. Probably the root problem is the usage of SvPV and newSVpvn in perl_syck.h. It should rather use SvPV_utf8 and newSVpvn_utf8. I think in this case all the hacks with ImplicitUnicode and suggesting an encoding layer when doing IO may be removed.

Regards,
    Slaven
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant