The method you want is $pdf->getPageContent($pagenum) which returns the decoded stream content. You can either work with that content as plain text, followed perhaps by setPageContent(), or you can get a parsed representation via getPageContentTree().
I played around with getPageContent, but it didn't seem to do what I wanted. For example, consider the following code:
If run on a particular PDF I'm trying to process, it produces the following output:
If I'm reading the PDF spec correctly, the "/Fm1 Do" is an instruction to display the XObject named "Fm1" (not sure if that's the right terminology). My question is: is there a way to access to contents of this XObject? I think...
...gives me a data structure describing this XObject, but I'm not sure how to extract the actual drawing commands, which is what I really need.
I hope that's clear. Thanks again!
Oh, I see, I misunderstood your original request. It's not at all obvious how to get the xstream.
This might do the trick, but I have not tested it
Thanks, that was super helpful! In case anyone has this problem in the future, here's the working code I came up with (this is inside a loop, hence the $_ and next):
As you can see it's a bit of a hack. Ideally there'd be a getParseTreeFromXObject() function (or something) that takes a page number and resource name and returns the parse tree of the associated xstream. I'd submit a patch but I'm not sure I understand the CAM::PDF internals well enough to produce something usable.
Thanks again for your help. This saved me a ton of time and frustration.