I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2010-12-03 22:28:05.504271-08 by cdolan in response to 13089
Re: Get formated text from a pdf file
You can do it, but it's not straightforward. You need to search the page content for constructs like this example:
BT 216 0 0 -216 142 291 Tm /F1.0 1 Tf (E) Tj 216 0 0 -216 231.754 291 Tm (m) Tj 216 0 0 -216 398.289 291 Tm (p) Tj 216 0 0 -216 510.402 291 Tm (lo) Tj 216 0 0 -216 679.996 291 Tm (y) Tj 216 0 0 -216 776.816 291 Tm (e) Tj 216 0 0 -216 887.242 291 Tm (e) ET
All text rendering is surrounded by "BT ... ET". The "Tf" command takes two preceding arguments, the name of the font ("/F1.0") and the scale ("1"). Then the Tj commands emit the actual text, positioned. In the example above, the text says "Employee".
Direct Responses: 13097 | Write a response