Posted on 2005-05-17 03:33:06-07 by ovid in response to 464
Re: How do I know which end tag goes with which start tag?

Hi Joshua

The problem with HTML is that it is inherently free form and stumbling across misnested tags can throw the best algorithms for a loop (no bad pun intended). Assuming your tags are properly nested, though, the best way of dealing with this is to either switch to HTML::TreeBuilder (which would let you treat the spans as leafs on a tree), or to maintain either a tag stack or a tag count. I've chose then the latter in the following HTML snippet:

#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple 3.13; my $parser = HTML::TokeParser::Simple->new(handle => \*DATA); while (my $token = $parser->get_token) { next unless $token->is_start_tag('span'); my $html = get_element($parser, 'span'); print $html; } # pass this the parser and the name of the tag you're interested in. sub get_element { my ($parser, $tag) = @_; my $html = ''; my $more_tags = 0; while (my $token = $parser->get_token) { return $html if $token->is_end_tag($tag) && ! $more_tags; $more_tags++ if $token->is_start_tag($tag); $more_tags-- if $token->is_end_tag($tag); $html .= $token->as_is; } return $html; } __DATA__ <head> <body> <span> <span foo="bar"> stuff </span> </span> </body> </head>
Direct Responses: Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.