I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2008-03-09 18:11:14-07 by andreys
encoding::warnings works wrong
It seems that encoding::warnings don't work the way it was advertised as when it is "use"d before any literals containing 8-bit they are swallowed by perl compiler.
For example, the following script won't raise a warning:
(For all the following examples note, that instead of "\xc0\xc1\xc2\xc3\xc4\xc5" actual code uses corresponding 8-bit characters. I could not post with these to this forum.)
#!/usr/bin/perl use encoding::warnings; # encoding::warning is "use"d BEFORE 8-bit literals my $str = "\xc0\xc1\xc2\xc3\xc4\xc5"; # first 6 letters of cyrillic alphabet in 8-bit encoding cp12 +51 my $yo = "\x{0401}"; # 7th letter of cyrillic alphabet, E with diaeresis binmode STDOUT, ":encoding(utf8)"; # Implicitly set PerlIO layer on STDOUT print "$str$yo\n"; # Concatenate byte- and unicode-strings.
To raise an "implicitly upgraded" warning the module must be imported after the problematic literals:
#!/usr/bin/perl my $str = "\xc0\xc1\xc2\xc3\xc4\xc5"; # first 6 letters of cyrillic alphabet in 8-bit encoding cp12 +51 use encoding::warnings; # encoding::warning is "use"d AFTER 8-bit literals my $yo = "\x{0401}"; # 7th letter of cyrillic alphabet, E with diaeresis binmode STDOUT, ":encoding(utf8)"; # Implicitly set PerlIO layer on STDOUT print "$str$yo\n"; # Concatenate byte- and unicode-strings.
Since the 8-bit encoded literals are widely scattered in legacy code, this module is almost useless in its current form. The problem lies in the following lines in its' source code:
# Don't worry about source code literals. sub cat_decode { my $self = shift; return $self->[LATIN1]->cat_decode(@_); }
These lines implicitly pass the 8-bit source code through a latin-1 decoder. By importing encoding::warnings, the whole 8-bit source code that is compiled after this momement is fed through this decoder. More dangerously, every 8-bit literal is converted to Unicode using latin-1 convertor in situations where no conversion is awaited without "use encoding::warnings". This can lead to data corruption.
The solution is to add a code that issues a warning in cat_decode method of encoding::warnings (and does not convert the source!!!). The only problem is it will spam about _every_ 8-bit literal, even those that will be never "implicitly upgraded" to Unicode.
The workaround is to place "use encoding::warnings" string late in the program (which is not applicable for using it in modules).
If the community is interested, I can send a simple patch to the cat_decode method that will work fine.
Direct Responses: Write a response