Hei Kjetil,
I do use HTML::Tidy for cleaning up html into valid and properly encoded xhtml, as well as for cleaning up general xml files.
The trick is to install version 1.05_02 of this module. It has the option to pass in the path to a configuration file. That way you can tweak libtidy's behaviour to your heart's content. See http://tidy.sourceforge.net/ for details.
To install this specific version using CPAN, you'd say:
cpan> install PETDANCE/HTML-Tidy-1.05_02.tar.gz
I know this is a developer release, but I'm using it production on a fairly high traffic site (>100k pageviews a day), and it's completely stable.
HTH,
Rhesa