I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2011-02-06 09:40:43.601814-08 by binarybits
Finding widths of text objects
Hi again,

I'm trying to write code to extract the "bounding rectangles" of text in PDF documents. I get a pageTree using getPageContentTree and then pass my callback function to its render() method. My renderer then extracts most of the information you need to compute the text's position on the screen, including its Tm, cm, Tfs, Tc, Tw, and Tz parameters. From these I seem to be able to estimate the upper, lower, and left bounds of each text object. However, to get the right bound, I need to know the widths of the glyphs in the text object, and since many fonts are variable-width, that requires unpacking the font itself.

Does the object the rendering code passes to callback functions include information that would allow computation of the width of text objects? If not, can you recommend another way to do this? I think CAM::PDF must do this kind of calculation somewhere, since you need to know how wide one text object is to figure out the starting position of the next one on that line. But I've spent some time poking through the code and haven't found it.

Thanks a lot!

-Tim
Direct Responses: 13182 | Write a response
Posted on 2011-02-06 10:23:57.871304-08 by cdolan in response to 13181
Re: Finding widths of text objects
CAM::PDF does indeed do this, but it does a pretty crappy job of it. If the document publishes font metrics, then it becomes a moderately easy job (except for kerning...). There's a method called getStringWidth() in CAM/PDF.pm that shows the limitations quite clearly if you look at the source code.
Direct Responses: Write a response