I do use HTML::Tidy for cleaning up html into valid and properly encoded xhtml, as well as for cleaning up general xml files.
The trick is to install version 1.05_02 of this module. It has the option to pass in the path to a configuration file. That way you can tweak libtidy's behaviour to your heart's content. See http://tidy.sourceforge.net/ for details.
To install this specific version using CPAN, you'd say: