I have a large (35mb) xml file which I need to get some information out of. At the moment, I'm just doing a simple parse and dump to get to grips with XML::Simple, however I've run into a frustrating problem.
The code I'm using is very simple:
#!/usr/bin/perl
use XML::Simple;
use Data::Dumper;
$dumpfile = "QUT_EPrints.xml";
$xml = new XML::Simple;
$data = $xml->XMLin($dumpfile);
print Dumper($data);
When I run it, I get a single error of the type 'Invalid name in entity [Ln: 30370, Col: 95]' and the script stops. In all cases, the problem appears to be a character code, most of the time it is a newline character (
). If I remove the character and re-run the script then it seems to continue happily but then goes on to throw up the same error in the same manner, only with another character on another line.
The odd thing is that when I've gone to remove the offending character, I can't see any difference between the offending character and the newline characters on lines surrounding it. It just doesn't appear to be unusual or different.
Can anyone tell me how to avoid this error, or how to at least tell XML::Simple to ignore these errors?
Thanks,
Guy