I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2009-05-04 22:05:01-07 by pixelpicker
Separating languages in keywords & caption...
Hello to all - and Special Hello to Phil.

After squeezing my head without finding a solution I post my question:

(I posted something related here: copying the caption of different in one new image, but I can't create a script doing this process vice versa.)

I have a lot of images with IPTC headline, keywords and captions in the following format:

headline: language1 blabla blabla | language1 blabla blabla
caption: language1 blabla blabla | language1 blabla blabla
Keyword1: language1 bla1 | language1 bla2
Keyword2: language1 bla3 | language1 bla4
Keyword3: language1 bla5 | language1 bla6 etc.

This keywords look in some other applications like this (Comma separated): language1 bla1 | language1 bla2, language1 bla3 | language1 bla4, language1 bla5 | language1 bla6

One image is keyworded in two languages. The language texts in headline and caption are separated by a " | " Pipe and each language term in keywords is separated also by a Pipe.

This is a horror, cause it gives an information salad :)

I read in the Exiftool specifications on XMP, that it is possible to save separate languages in XMP fields with language code eg. -en for english. VERY nice! This is what I would like to do, but with my nixed language images.

The logic would be, that Exiftool takes an image and
1) looks in the above three fields for the " | " Pipe.
4) Then copys the the part before the Pipe to the related XMP-language1 filed and
3) then the part after the Pipe to the related XMP-language2 filed

The result would be an image file with separated languages in XMP.

Could anybody give me a hint on how to create the script for this?

Many greetings from
pixelpicker
Direct Responses: 10591 | Write a response
Posted on 2009-05-04 22:59:59-07 by pixelpicker in response to 10589
Re: Separating languages in keywords & caption...
Ups - sorry: the sheme for an image looks of course like this:

headline: language1 blabla blabla | language2 blabla blabla
caption: language1 blabla blabla | language2 blabla blabla
Keyword1: language1 bla1 | language2 bla2
Keyword2: language1 bla3 | language2 bla4
Keyword3: language1 bla5 | language2 bla6 etc.

keywords look in some other applications like this (Comma separated): language1 bla1 | language2 bla2, language1 bla3 | language2 bla4, language1 bla5 | language2 bla6

Many greetings from
pixelpicker
Direct Responses: 10596 | Write a response
Posted on 2009-05-05 11:33:55-07 by exiftool in response to 10591
Re: Separating languages in keywords & caption...
Unfortunately, only the caption (XMP-dc:Description) supports alternate languages in XMP. The keywords (XMP-dc:Subject) and headline (XMP-photoshop:Headline) do not support alternate languages. (Only lang-alt type tags support alternate languages in XMP -- see the tag name documentation for details.)

Given this limitation, do you still want to try to separate the caption languages?

- Phil
Direct Responses: 10599 | Write a response
Posted on 2009-05-05 14:31:08-07 by pixelpicker in response to 10596
Re: Separating languages in keywords & caption...
Hello Phil! Hope you had a wonderful vacation - welcome back :)

Yes, I have to do something to separate the language-keywords.
Cause its not possible to write the two languages completely into the XMP it makes no sense to use XMP. Sad.
Half the way I found a solution:
I use your unser-defined tags with the config-file. For the caption it looks like this:

MyCaptionB => { Require => 'Caption-Abstract', ValueConv => q{ $val=~s/([^\|]*)\| .*/\1/ ? $val : undef }, },

This script cuts everything after the separator | as it looks up to now. (I say "looks" cause I found the command
s/([^\|]*)\| .*/\1/
in the net and after experimenting it seems to work.)

What remains is a script to cut vice versa: cut everything before |

One could use the double keyworded images for storing and when a language is needed one could create a copy of the image an delete the not needed language in this new image. The command for copying and deleting after the | should look something like this:

exiftool -o DIR "-caption-abstract<mycaptionb" DIR


Do you have any clue how to cut vice versa?

Many greetings
pixelpicker
Direct Responses: 10600 | Write a response
Posted on 2009-05-05 15:10:17-07 by exiftool in response to 10599
Re: Separating languages in keywords & caption...
I would suggest pulling out the languages by name, maybe something like this:

CaptionLanguage1 => { Require => 'Caption-Abstract', ValueConv => '$val =~ /(^|\|)\s*language1\s+(.*?)\s*(\||$)/si ? $2 : undef', }, CaptionLanguage2 => { Require => 'Caption-Abstract', ValueConv => '$val =~ /(^|\|)\s*language2\s+(.*?)\s*(\||$)/si ? $2 : undef', },

Here, the regular expression matches "language1" or "language2" (case insensitive), then takes all text following this up to the "|" symbol or the end of the string.

- Phil
Direct Responses: 10601 | Write a response
Posted on 2009-05-05 15:17:22-07 by exiftool in response to 10600
Re: Separating languages in keywords & caption...
I should mention that this will be more complex for the Keywords since they are a list-type tag. In this case, you may have to loop through elements in the array:

KeywordsLanguage1 => { Require => 'Keywords', ValueConv => q{ my @vals = ref $val ? @$val : ($val); foreach $val (@vals) { $val =~ /(^|\|)\s*language1\s+(.*?)\s*(\||$)/ and $val = $2; } return \@vals; }, },

- Phil
Direct Responses: 10602 | Write a response
Posted on 2009-05-05 15:27:29-07 by exiftool in response to 10601
Re: Separating languages in keywords & caption...
Here is maybe a better way to do the keywords. The above example returned the entire keyword string if the language didn't exist, while this version returns nothing:

KeywordsLanguage1 => { Require => 'Keywords', ValueConv => q{ my @vals; foreach (ref $val eq 'ARRAY' ? @$val : $val) { push @vals, $2 if /(^|\|)\s*language1\s+(.*?)\s*(\||$)/; } return @vals ? \@vals : undef; }, },

- Phil
Direct Responses: 10603 | Write a response
Posted on 2009-05-05 15:31:08-07 by exiftool in response to 10602
Re: Separating languages in keywords & caption...
Ooops. Forgot to qualify the regular expression with "si" in the last 2 examples to allow matching of newlines in the text with case insensitivity.

- Phil
Direct Responses: 10606 | Write a response
Posted on 2009-05-05 21:14:45-07 by pixelpicker in response to 10603
Re: Separating languages in keywords & caption...

WOW PHIL! Thanks for your quick and elegand solution! - like always :)

The idea to difference between the languages in the naming too is very helpful.

But I didn't understand correctly how I have to use the code, I must have made a mistake cause it doesn't do here what it should.

If I understood proper your code for CaptionLanguage1 and CaptionLanguage2 cuts everything after the | ? Or goes till the end when no pipe is there. But what does the
.../(^|\|)\s*language1\...
"language1" in this part? Is it a variable?

What I did is, to put your code of the CaptionLanguage1 in the config, renamed the necessary parts to "english" to extract the second language.
But ExifTool says: 1 directories scanned 0 images updated :(

Any idea whats wrong?

Have a good day.

Greetings from
pixelpicker
Direct Responses: 10607 | Write a response
Posted on 2009-05-05 21:22:01-07 by pixelpicker in response to 10606
Re: Separating languages in keywords & caption...
Sorry - I meant:
...CaptionLanguage1 and CaptionLanguage2 cuts everything before the | ?
Cause this is whats needed.

Greetings from
pixelpicker
Direct Responses: 10610 | Write a response
Posted on 2009-05-06 11:31:11-07 by exiftool in response to 10607
Re: Separating languages in keywords & caption...
The expression for CaptionLanguage1 looks for the string "language1", and takes the text after this (up to the "|" or the end of the string, whichever comes first). You need to change "language1" in this expression to be the name of the actual language you want.

> exiftool a.jpg -caption-abstract -captionlanguage1 -captionlanguage2 Caption-Abstract : language1 bla1 | language2 bla2 Caption Language 1 : bla1 Caption Language 2 : bla2

- Phil
Direct Responses: 10612 | Write a response
Posted on 2009-05-06 13:58:03-07 by pixelpicker in response to 10610
Re: Separating languages in keywords & caption...
OK - there was a misunderstanding - I didn't say clearly how the metadata is given:

its like this:

Caption-Abstract : Auto | car
Caption Language 1 : Auto
Caption Language 2 : car

Theres no "English" or other language in front.
I tried to modify your script. What I found out up to now:

CaptionEnglish => { Require => 'Caption-Abstract', ValueConv => '$val =~ /(^|\|)\s*\s+(.*?)\s*(\||$)/si ? $2 : undef', },

This creates the "car". Verry Good, nearly done :)

But:
CaptionDeutsch => { Require => 'Caption-Abstract', ValueConv => '$val =~ /(^|\|)\s*|\s+(.*?)\s*(\||$)/si ? $2 : undef', },

creates nothing :(
Could you help me one more time, please?

Many greetings from
pixelpicker
Direct Responses: 10613 | Write a response
Posted on 2009-05-06 15:10:26-07 by exiftool in response to 10612
Re: Separating languages in keywords & caption...
Sorry, yes, I misunderstood. So you do simply want to extract all text before and after the "|":

CaptionLanguage1 => { Require => 'Caption-Abstract', ValueConv => '$val =~ /^(.*?)\s*\|/s ? $1 : undef', }, CaptionLanguage2 => { Require => 'Caption-Abstract', ValueConv => '$val =~ /\|\s*(.*?)$/s ? $1 : undef', },

and works like this:

> exiftool a.jpg -caption-abstract -captionlanguage1 -captionlanguage2 Caption-Abstract : language1 bla1 | language2 bla2 Caption Language 1 : language1 bla1 Caption Language 2 : language2 bla2

- Phil
Direct Responses: 10614 | Write a response
Posted on 2009-05-06 17:35:36-07 by pixelpicker in response to 10613
Re: Separating languages in keywords & caption...

************* Yes - this is the right thing!! *************
************* Works like a machine!! *************
************* Thank you very much Phil. *************

For the keywords I took your code like this:

KeywordsDeutsch => { Require => 'Keywords', ValueConv => q{ my @list = ref $val ? @$val : ($val); my $changed; s/([^\|]*)\| .*/\1/ and $changed = 1 foreach @list; return $changed ? \@list : undef; }, }, KeywordsEnglish => { Require => 'Keywords', ValueConv => q{ my @vals; foreach (ref $val eq 'ARRAY' ? @$val : $val) { push @vals, $2 if /(^|\|)\s*\s+(.*?)\s*(\||$)/; } return @vals ? \@vals : undef; }, },

No clue how it works in detail, but it does exactly what it should :)))

Hmm - but I have another question before I look on it more detailed:
Wouldn't it be possible with ExifTool to create own iptc-tags (caption, keywords, headline) for a second language? Of course this wouldn't be a standard but could help.

Have a good evening.

Greetings from
pixelpicker
Direct Responses: 10615 | Write a response
Posted on 2009-05-06 17:49:38-07 by exiftool in response to 10614
Re: Separating languages in keywords & caption...
Great.

Sure, you can create custom IPTC tags if you want. Of course, nothing but exiftool could ever read them... :)

If you want to define custom tags with future compatibility in mind, XMP is the better choice.

- Phil
Direct Responses: 10616 | Write a response
Posted on 2009-05-06 18:59:29-07 by pixelpicker in response to 10615
Re: Separating languages in keywords & caption...
As long as there is exiftool - who cares? :)
But I will look deeper into this xmp-thing in some time.

For now be thanked and have a good time.

Greetings from
pixelpicker :)
Direct Responses: Write a response