I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2007-01-11 19:01:47-08 by exiftool in response to 4011
Re: Non Printable Ascii Chars In XMP
Hi Mark,

Thanks for the sample via email.

I think the thing to do is to assume Latin1 coding unless otherwise specified. This should fix the problem with your sample image at least. It is a rather significant change to start translating IPTC text, but I hope I have done it in a way that won't break things for too many people, and hopefully it will solve more problems than it creates.

The strategy now is to convert IPTC text if the CodedCharacterSet is recognized, and to assume Latin1 if the CodedCharacterSet tag doesn't exist. The ISO 2022 escape sequences used to switch between different codings are not yet supported, and the text is assumed to be all in a single character set. Also, when creating a new IPTC record from scratch, a CodedCharacterSet value of "UTF8" is written by default.

The new version will require a lot of testing since this is a fairly significant change. It would help if you could help with this effort. I have uploaded a 6.70 pre-release here for you to play with.

Note that the translations are only performed if the coding is Latin1 or UTF8. Otherwise no translation is done. This will all be spelled out in the new FAQ #10, which will read:

IPTC: IPTC text is converted only for recognized values of the IPTC:CodedCharacterSet tag. Currently recognized encodings are UTF-8 ("UTF8" or "ESC % G") and Latin1/ISO-8859-1 ("Latin" or "ESC . A"). "Latin" is assumed if the CodedCharacterSet tag is missing. No translation is performed for all other values of CodedCharacterSet. When reading, text is translated to UTF-8 by default, or Latin1 with the -L option. When writing, the inverse translation is performed. When creating a new IPTC record, ExifTool automatically sets CodedCharacterSet to "UTF8" unless otherwise specified. This causes all text strings to be stored in UTF-8, which is the preferred encoding.

- Phil
Direct Responses: 4066 | 4068 | 4072 | Write a response