Posted on 2005-02-15 09:41:00-08 by dirk
Space character dropped in MARC::Charset::_unpack
The space character is dropped in the conversions to_utf8 and to_marc8. This happens because of the way the _unpack function is implemented. It uses pack/unpack pattern "A3" which removes trailing spaces. But if the packed string starts with three spaces, all three spaces are removed.
Here is one possible way to fix the problem:
sub _unpack { my $x = shift; return( undef ) if ! defined( $x ); # return ( unpack( 'A3A1A1', $x ) ); # added: my ($s1, $s2, $s3) = unpack( 'A3A1A1', $x ); $s1 =~ s/^$/ /; return ($s1, $s2, $s3); }
Dirk Kinnaes K.U.Leuven LIBIS-Net
Direct Responses: 118 | 119 | Write a response
Posted on 2005-02-15 14:17:02-08 by jmcnamara in response to 116
Re: Space character dropped in MARC::Charset

It uses pack/unpack pattern "A3" which removes trailing spaces.

I guess that you could also replace the unpack template with 'a3a1a1' since the "a" format won't strip trailing whitespace.

John.
--
Direct Responses: 125 | Write a response
Posted on 2005-02-15 16:58:07-08 by edsu in response to 116
Re: Space character dropped in MARC::Charset

Can you send me a simple test that illustrates this bug? I'm trying this test, and it works fine...I guess I'm just missing the context where you discovered the problem.

my $charset = MARC::Charset->new(); is( $charset->to_utf8(' '), ' ', 'three spaces ok' );
Direct Responses: 124 | Write a response
Posted on 2005-02-16 09:09:55-08 by dirk in response to 119
Re: Space character dropped in MARC::Charset
Here is a simple test that illustrates the problem, using only ASCII characters:
use MARC::Charset; my $charset = MARC::Charset->new(); print $charset->to_marc8("one blank") . "\n"; print $charset->to_utf8("one blank") . "\n";

This produces the following output:
oneblank
one blank
So the problem exists only for to_marc8, not for to_utf8 (sorry, I should have checked this first).
Direct Responses: Write a response
Posted on 2005-02-16 09:18:01-08 by dirk in response to 118
Re: Space character dropped in MARC::Charset
Well, the a format won't work is this case because trailing whitespace really has to be stripped. Only, the first character should not be stripped away if the first three characters are all spaces.

Dirk
Direct Responses: Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.