Randal Schwartz wrote an article on filtering HTML tags for forum comments etc in the May 2003 issue of The Perl Journal. He used XML::LibXML which can be used to read HTML instead and also has a recovery mode that allows it to recover from unbalanced tags etc.