I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2010-09-24 23:46:00.692996-07 by crult
Question: Handling large number of XML with XML-Twig
Hello, I have a large number of xml files in a folder. I want to read and extract the content of each xml file to a new.txt. I'm only interested in the content having the tag <Texte>, and i want to create a .txt file (a texte file for each of my xml's). I use the perl modules xml twig and xml simple. There's the code i have until now:
my $xml_dir="C:\xmlperl"; my $output="C:\xmlperl\output.txt"; my $file = $ARGV[0]; opendir(DIR,$xml_dir) || die; my @TranscriptsList = grep(/xml$/, readdir(DIR)); closedir(DIR); foreach $file (@TranscriptsList) { my $twig= new XML::Twig(TwigRoots => {Texte => 1}); $twig->parsefile($file); $twig->print; open XMLOUT, '>>C:\xmlperl\output.txt'; $twig->print(\*XMLOUT) or die; close XMLOUT; }
As you can see, i can only print in a single .txt file the output of the xml content. Can you help me please? thank you in advance
Direct Responses: 12959 | Write a response
Posted on 2010-09-25 03:16:52.794842-07 by mirod in response to 12958
Re: Question: Handling large number of XML with XML-Twig

Well, you have a single output file, C:\xmlperl\output.txt, so the code outputs everything to it.

You need to open a new output file for each input file.

my @TranscriptsList = glob( "$xml_dir/*.xml"); # easier than using readdir foreach my $xml_file (@TranscriptsList) { # create a text file name from the input file name and open it my $text_file= $xml_file; $text_file=~ s{\.xml$}{\.txt}; open( my $text_fh, '>', $text_file) or die "cannot create $text_file: $!"; # I assume you only want the text, not the markup (tags), otherwise you could do # $_->print( $text_fh) to also print the markup my $twig= XML::Twig->new(twig_roots => {Texte => sub { print {$text_fh}, $_->text; } }) ->parsefile( $xml_file); }

From the tag name 'Texte' I suspect you might run into encoding problems, so you might need to open the output file in utf8 mode.

You may also want to read a bit about modern Perl style, bareword filehandles (XMLOUT), indirect object notation (new XML::Twig) and opendir/readir are not used a lot these days.

Direct Responses: 12960 | Write a response
Posted on 2010-09-25 10:45:54.839628-07 by crult in response to 12959
Re: Question: Handling large number of XML with XML-Twig
I tried it but it gives me a syntax error at the line:
my $twig= XML::Twig->new(twig_roots => {Texte => sub { print {$text_fh}, $_->text; } })->parsefile( +$xml_file);
I wrote it in a single line in my editor. After the execution it says: Syntax error near "}," I can't understand. Thank you very much for the help.
Direct Responses: 12961 | 12962 | Write a response
Posted on 2010-09-25 13:07:54.447052-07 by crult in response to 12960
Re: Question: Handling large number of XML with XML-Twig
Also there's another strange thing...When i tried my way for an another input folder of files, the script dies or gives the same output as before(there is the mystery, it doesn't print the content of the xml files of the folder that i give but the content of old files that i gave him this morning!) Can you propose me also a solution to print the output to a single file, by the same way that you proposed me before for the multiple output?Thanks!
Direct Responses: 12963 | Write a response
Posted on 2010-09-25 13:26:56.494305-07 by mirod in response to 12960
Re: Question: Handling large number of XML with XML-Twig

Oops, there's an extra , that should not be there:

print {$text_fh} $_->text
Direct Responses: Write a response
Posted on 2010-09-25 13:32:12.633776-07 by mirod in response to 12961
Re: Question: Handling large number of XML with XML-Twig
The input folder is in your code, not in mine. Read the proper one.
Direct Responses: 12964 | Write a response
Posted on 2010-09-25 13:45:33.872739-07 by crult in response to 12963
Re: Question: Handling large number of XML with XML-Twig
Ok thank oy very much!
Direct Responses: Write a response