I ran into a PDF writer that puts leading zeroes in front of object numbers. This makes the object numbers look like entries in the xref table. Here are a couple of examples from the file header:
%PDF-1.4
%<E2><E3><CF><D3>
0000000001 0 obj
<< /Creator (FreeFlow Accxes, Version: 13.0 Build: 153)
/Producer (FreeFlow Accxes, Version: 13.0 Build: 153)
/CreationDate (D:20090821155153-04'00')
>>
endobj
0000000002 0 obj
<< /Pages 0000000003 0 R
/Type /Catalog
/Metadata 0000000004 0 R
/OutputIntents[0000000005 0 R]
/MarkInfo << /Marked true>>
>>
endobj
And here's what the trailer looks like:
trailer
<< /Size 20
/Root 0000000002 0 R
/Info 0000000001 0 R
/ID [<2a3b5fbcdb53e232a3c1a23435fd2ae3><2a3b5fbcdb53e232a3c1a23435fd2ae3>]
>>
startxref
0000340789
%%EOF
Here's what happens when I run readpdf.pl:
$ readpdf.pl freeflow.pdf
Bad request for object 0000000002 at position 0 in the file
The PDF root node is not a dictionary.
CAM::PDF is having trouble parsing these indirect references. Here's my patch -- just a stab in the dark, made to the dereference subroutine:
*** PDF.pm.bak Wed Sep 23 09:50:19 2009
--- PDF.pm Wed Sep 23 09:59:59 2009
***************
*** 1694,1699 ****
--- 1694,1700 ----
$key = $key->{value};
}
+ $key = int $key;
if (!exists $self->{objcache}->{$key})
{
#print "Filling cache for obj \#$key...\n";
This seems to work, but I don't have a lot of confidence in it. The PDF writer is Xerox's controller for their wide format printers and scanners.