I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2005-05-17 03:03:38-07 by jdcook
How do I know which end tag goes with which start tag?
I am attempting to write a program that parses an HTML file and then does some text substitution. I am running into a problem, however. Perhaps it is because I don't understand how to use the HTML::Tokeparser::Simple package. Here's what I am trying to do: Loop through a file looking for either <span> or <div> tags. It then checks to see if there is a certain attribute there (editable='true'). If so, I want it to grab all text (including any additional tags) between it and the closing tag of this tag "set". Some sample XHTML to illustrate:
<div class="content"> <span editable="true" id="nvumaincontent">stuff</span> <span editable="true" optional="true" id="nvutest5"><span style="background: red;">more stuff</span +> <p>test</p></span><span editable="true" id="nvu56">even more stuff <!-- this is the beginning of a comment --> </span> <div editable="true" optional="true" repeatable="true" movable="true" id="nvutest43">incredible boat loads of stuff <!-- this is another comment --> </div> <div editable="true" id="anotherblock4">an unbelievable quantity of stuff! <!-- yet another comment --> <div id="newtest">Yo, dude!</div> </div> <!-- end main content --> </div>

I am mostly getting the results I want with one exception. If you see the lines above that have more than one <.span> or </div> in a row, I am only able to get the first of those tags. Is there anything I can do to tell whether or not the </span> or </div> tags actually go with the relevant opening tag? I am posting some code as follows:
use File::Find; use strict; use HTML::TokeParser::Simple; #my $new_folder = 'new_html/'; my @html_docs = "test5.html"; our $spancontents=""; my @files; my $ByteCount=0; my $filelist=""; my $isflagon=0; my $idflag; my %spancontents; my $templatelocation; my $currentdoc; foreach my $doc ( @html_docs ) { $currentdoc=$doc; my $p = HTML::TokeParser::Simple->new( file => $doc ); while ( my $token = $p->get_token ) { if ($token->is_start_tag('span') or $token->is_start_tag('div')) { if ($token->get_attr('editable')=~/true/) { $isflagon=1; $idflag=$token->get_attr('id'); } } if ( ($token->is_start_tag('span') and $isflagon) .. $token->is_end_tag('span') and $isflag +on){ my $text=$token->as_is; $spancontents.=$text.","; #next; } if ( ($token->is_start_tag('div') and $isflagon) .. $token->is_end_tag('div')){ my $text=$token->as_is; $spancontents.=$text.","; #next; #not sure if needed, seems to mess things up } if (($token->is_end_tag('span') or $token->is_end_tag('div')) and $isflagon) { $isflagon=0; #$spancontents.=$token->as_is.","; #not sure if needed, seems to mess things up $spancontents{"$idflag"}.=$spancontents; $spancontents=""; } if ($token->is_start_tag('html')) { my $attrs=$token->get_attr('templateref'); $templatelocation=$attrs; } } } print "\n\n\n"; foreach my $value (keys %spancontents) { print "value is $value\n"; print "\nMy $value = $spancontents{$value} \n\n-------------------------\n"; }

Here is some sample output using similar HTML as above:
value is anotherblock4 My anotherblock4 = <div editable="true" id="anotherblock4">,an, unbelievable qua ntity of stuff! ,<!-- yet another comment -->, ,<div id="newtest">,Yo, dude!,</div>, ------------------------- value is nvutest43 My nvutest43 = <div editable="true" optional="true" repeatable="true" movable="t rue" id="nvutest43">,incredible boat loads of stuff ,<!-- this is another comment -->, ,</div>, ------------------------- value is nvutest5 My nvutest5 = <span editable="true" optional="true" id="nvutest5">,<span style=" background: red;">,more stuff,</span>, ------------------------- value is nvumaincontent My nvumaincontent = <span editable="true" id="nvumaincontent">,stuff,</span>, ------------------------- value is nvu56 My nvu56 = <span editable="true" id="nvu56">,even more stuff ,<!-- this is the beginning of a comment -->, ,</span>, -------------------------

Notice that there is only one div or span closing tag under sections nvutest5 and anotherblock4. There should be two of them (i.e. two div's or two span's). My bottom line question is this: how can I tell which opening tag that the closing tag I am retrieving using get_end_tag goes to? Thanks for any help you can give and thanks for making this module available. Joshua Cook
Direct Responses: 465 | Write a response