I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2012-01-07 06:06:07.91582-08 by jochenhayek in response to 13582
Re: Parsing tables in PDF files
I have been using CAM::PDF and also "pdftohtml",
and I would like to suggest the latter for your task.
As Chris mentioned, "you'd be doing a lot of heuristic coding", but according to my experience you are better off with the output of "pdftohtml -xml", supposing you are fine with xml parsing.
I used XML::Simple for that purpose, but maybe you are more experienced.
There is no such concept as tables in PDF, and you would have to do a lot of guessing (-> heuristics) as to what are columns and rows and cells on these pages.
If you are really progressing on this, let us now!
Me personally, I am quite interested in that kind of utility as well, maybe I am going to work on it as well.
Direct Responses: Write a response