I wanted to build an awesome place for people to discuss module specific issues, but I don't have any more time for this, and there are much better places to discuss Perl-related issues. I'd recommend asking your question on Stack Overflow or on Perl Monks.
If you are looking for a Perl tutorial or Perl-related news, I hope these links will serve you well.
Posted on 2006-08-17 19:37:05-07 by iaw4
empty cell or col bug or feature of getTableText in 2.113 dist?
It seems to me that getTableText has some strange behavior when it comes to empty cells/columns.
#!/usr/bin/perl -w use strict; use OpenOffice::OODoc; my $doc= ooDocument(file => "test.ods"); $doc->{'field_separator'}= ",\t"; my $sheet= $doc->getTableText("Sheet1", 1000, 1000); print $sheet;
and my ods test file contains
1 6 9 13 2 10 3 15 4 7 12 16 5 8
(i.e., C column and E columns are blank. 10 appears in D column. 15 in F column.) The output of my program, however, is
Argument "1.15_02" isn't numeric in subroutine entry at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/ +OODoc/File.pm line 16. 1, 6, , 9, , 13 2, , 10, , 3, , , , 15 4, 7, , 12, , 16 5, 8,
Direct Responses: 2790 | Write a response
Posted on 2006-08-17 20:07:40-07 by mlcohen in response to 2789
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
If you replace:
my $sheet= $doc->getTableText("Sheet1", 1000, 1000);
with
my $normsheet = $doc->normalizeSheet("Sheet1", 10, 10); my $sheet= $doc->getTableText(normsheet, 10, 10);
things will work. Check out the documentation on normalizeSheet. Short version is that staroffice compresses tables, and it takes a long time to decompress them, so rather than doing it fully for you, you have to specify how much to decompress. The main symptom of not decompressing, or 'normalizing', the sheet is that funny things happen to blank cells, especially multiple blank cells in a row. The key is to normalize as small an area as possible, because it really will take a long time to normalize a large area. Try to know the row and column # beforehand, if possible.

Hope that helps,
Matt
Direct Responses: 2797 | 2805 | Write a response
Posted on 2006-08-20 02:06:55-07 by iaw4 in response to 2790
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
hi matt: aha! thanks for the info. this is indeed exactly what I needed. quite a headscratcher.

is there a way to find out what the bottom right cell in a spreadsheet is? right now, I am using 1000,1000 simply because I don't know how to get the latter.

regards, /ivo

Direct Responses: 2798 | Write a response
Posted on 2006-08-20 02:22:53-07 by iaw4 in response to 2797
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
celebrated too early--the code fragment that first normalizes and then gettabletexts fails:
$ perl ods2csv.pl test2.ods Argument "1.15_02" isn't numeric in subroutine entry at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/ +OODoc/File.pm line 16. wrong condition 'table:(covered-|)table-cell' at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/OODoc/T +ext.pm line 2220
Direct Responses: Write a response
Posted on 2006-08-21 15:03:44-07 by iaw4 in response to 2790
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
hi matt: not sure if you saw my response. I cannot get around an error message, no matter what I try (including your example, [except normsheet -> $normsheet]):
wrong condition 'table:(covered-|)table-cell' at /usr/lib/perl5/site_perl//5.8.8/OpenOffice/OODoc/T +ext.pm line 2220
the docs http://www.annocpan.org/~JMGDOC/OpenOffice-OODoc-2.027/OODoc/Text.pod suggest giving
$doc = ooDocument(file => 'report.sxc'); my $sheet = $doc->normalizeSheet('Sheet1', 7, 9);
but this fails, too. is this a bug or a feature? regards, /iaw
Direct Responses: 2806 | Write a response
Posted on 2006-08-21 15:16:29-07 by mlcohen in response to 2805
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
I don't know what to say...I have your sample code exactly as you posted before:
#!/usr/bin/perl -w use strict; use OpenOffice::OODoc; my $doc= ooDocument(file => "test.ods"); $doc->{'field_separator'}= ",\t"; my $normsheet = $doc->normalizeSheet("Sheet1", 10, 10); my $sheet= $doc->getTableText($normsheet, 10, 10); #my $sheet= $doc->getTableText("Sheet1", 1000, 1000); print $sheet;
and it works perfectly for me. I have your original line commented out, and I get the same results as you had, and then I replace it with the two lines about normsheet and everything works. Did you try the simple test case again, with the new code? Or are you trying in your full ods2csv script? Try the test case again, I bet it will work, and the problem is elsewhere in the script. And don't worry, this package is a bit confusing to use at first, just like anything that's powerful...it will take a bit of time to figure out what makes it tick. And sometimes, there are bugs :)

-Matt
Direct Responses: 2924 | Write a response
Posted on 2006-09-06 01:37:48-07 by phil in response to 2806
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
using:
XML::Twig 3.26 (and 3.27 development)
OpenOffice::OODoc::Text 2.225

I see the same problem when calling normalizeSheet. It appears twig doesn't like an expression being passed to it.

I need to finish something yesterday, so I haven't really looked harder, but as an untested workaround, I did this to _expand_row() in OODdoc's Text.pm (line 2220):

original:
my @cells = $row->selectChildElements ('table:(covered-|)table-cell');
changed to:
my @cells = $row->selectChildElements ('table:covered-table-cell'); push(@cells, $row->selectChildElements ('table:table-cell'));

matt, what version of twig are you running?

Direct Responses: 2928 | Write a response
Posted on 2006-09-06 09:49:08-07 by phil in response to 2924
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
p.s. the above hack is only a workaround if you are writing to cells. If you are reading cells, there's more places in Text.pm that use the expression:
('table:(covered-|)table-cell')

All these places would need looking at, but the right thing is to figure out the real problem and let the appropriate module author know...I'll do that if nobody else does in the next few days. Info from matt on his twig version would help a lot.

Direct Responses: 2929 | 2931 | Write a response
Posted on 2006-09-06 10:17:10-07 by bernos in response to 2928
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
Hi,
I have the same problem. This simple code:
use OpenOffice::OODoc; my $archive = ooFile('chart1.sxc') or die "Cannot open input file\n"; my $content = ooDocument(archive => $archive) or die "Cannot extract content from input file\n"; my $table = $content->getTable(0,10,2);
produces this error:
wrong condition 'table:(covered-|)table-cell' at /usr/lib/perl5/site_perl/5.8.5/OpenOffice/OODoc/Text.pm line 2220
I'm using OpenOffice-OODoc-2.027 and XML-Twig-3.26
Direct Responses: 2930 | Write a response
Posted on 2006-09-06 12:35:21-07 by jmgdoc in response to 2929
Re: empty cell or col bug or feature of getTableText in 2.113 dist?

The bug is pinpointed in XPath.pm 2.017, so Text.pm should not be patched and the XML::Twig version (3.22 or later) doesn't matter .

The fix will be done in the next O::O release. However, in the meantime, the existing XPath.pm should be manually replaced by a provisional one, which is now available at
http://jean.marie.gouarne.online.fr/tech/oodoc/XPath.pm
Direct Responses: 2998 | Write a response
Posted on 2006-09-06 14:28:57-07 by mlcohen in response to 2928
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
I'm using OpenOffice-OODoc 2.026, with XML-Twig 1.303, it seems. At least, that's what I assume this line at the top of Twig.pm means:

# $Id: Twig_pm.slow,v 1.303 2006/05/26 08:07:14 mrodrigu Exp $

Seems like mine is a lot older than what the rest of you are using. I guess I should talk to my IT guys about upgrading it.

-Matt
Direct Responses: 2932 | Write a response
Posted on 2006-09-06 14:54:43-07 by jmgdoc in response to 2931
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
Don't worry about this Twig_pm.slow heading comment; it doesn't indicate the real Twig.pm version number.
Look at the $VERSION variable in the BEGIN block (near the line #90) in Twig.pm

Note that you could not run OpenOffice::OODoc 2.026 without XML::Twig 3.22 or later.
Direct Responses: 2933 | Write a response
Posted on 2006-09-06 15:09:29-07 by mlcohen in response to 2932
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
In that case, I have version 3.26.
Direct Responses: Write a response
Posted on 2006-09-11 21:16:22-07 by jmgdoc in response to 2930
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
OpenOffice::OODoc 2.028 has been posted today.
This release fixes a bug affecting some table-related access methods (previously reported in this forum).
In addition, the OpenOffice::OODoc::Text manual section has been updated in order to describe the text box related methods (available but not documented in 2.027).
Direct Responses: 3257 | Write a response
Posted on 2006-10-13 23:22:56-07 by cgrauer in response to 2998
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
I installed 2.028 but I still experiance this problem. getTableText merges empty cells (not all but some of them. I found no regularity), also on normalized tables (the results are different for normalized and non normalized tables, however both are false).
Direct Responses: 3258 | Write a response
Posted on 2006-10-14 23:17:04-07 by cgrauer in response to 3257
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
I hope this forum is the right place for the following. I got to know OODoc only two days ago and I'm unfamiliar with it's developing community etc. And I apologize in advance for my bad english as I'm not a native speaker.

I spent part of my weekend examining the sourcecode of OpenOffice::OODoc::Text to understand, how it works and why I experianced the mentioned problems. Just in case anyone is interested in the solution, I post it here:

My problem was this: I use the function getTableText() to 'export' Data from all the tables in a spreadsheet to csv files. I don't know in advance how many tables the sheet contains nore the names and the size of them. This is how I read the content from all the tables and store it in a hash-array-structure:
sub y_getTableData { # returns a ref on a list containing hashes with tablename, table number # and the content in text format (csv) my $ood=shift; # ods document object my ( @array ); my $count = scalar(($ood->getTableList())) - 1; for my $n (0 .. $count) { my ( %hash ); $hash{'number'} = $n; $hash{'text'} = $ood->getTableText($n); $hash{'name'} = $ood->getAttribute($ood->getTable($n),'table:name'); push ( @array, { ( %hash ) } ); } return [ @array ]; }

Now, as I didn't knew, OODoc by default normalizes only 32 rows and 26 cols ( cf. options 'max_cols' and 'max_rows'). This caused the errors in my csv files because my tables contain up to 2100 rows and could contain even more in future. So I first changed the option 'max_rows' to 65536. This produced correct csv files - after several cups of coffee though! So I thought of first counting the rows of the sheet and then call getTableText with the parameters 'width' and 'length' to make OODoc normalize only the given size. But this is not possible as the sheet is already normalized when I count the rows (or in other words: to count the rows, the table has to be normalized first).

So I decided to patch OODoc::Text, telling it to stop normalization as soon as it finds a row with the ROW_REPEAT_ATTRIBUTE (table:number-rows-repeated in the XML-File) equal or greater than the remaining number of rows to normalize (i.e. the number of allready processed rows subtracted from paramter 'length' or the option 'max_rows' and decreased by 2), assuming that this indicates the empty rows until the end of the sheet. As I found in the XML-Code of the Sheet, OpenOffice leaves the last row as separate row with no repeate attribute. This is why I dereased the value by 2.

No I can change the option 'max_rows' to 65536 (and 'max_cols' to 256) to be sure, every table will be normalized properly irrespective of it's size and in a reasonable time. It only may cause problems if the row number 65536 (i.e. the last row!) is not empty or if there are repeated non-empty rows until the end of the sheet.

These are the changes I made on OpenOffice::OODoc::Text:

Changing line 2350 to:
while ($rep > 1 && ($rownum < $length) && !$skip )

Changing line 2359 to:
if ( ( $rownum < $length ) && !$skip )

Inserting a new line after line 2349 with:
$skip = $rep > ( $length - $rownum - 2 ) ? 1 : 0;

Inserting a new line after line 2339 with:
my $skip = 0;

That's all. Will this affect the module's functions in any way? I can't imagine. Perhaps it is even a suggestion for the developpers of OODoc. I'm not sure but I think csv-export of spreadsheets could be a common task for OODoc.
Direct Responses: 3261 | Write a response
Posted on 2006-10-15 13:51:57-07 by jmgdoc in response to 3258
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
Hopefully, you should not need to patch O::O::Text in order to change the size of the area to be normalized. In order to control this size, you can provide getTable() with the appropriate values as optional arguments. Example:
my $table = $doc->getTable($tablename, $height, $width); my $text = $doc->getTableText($table);

When called with optional size arguments, getTable() automatically calls normalizeSheet() with the given values (and the default size is ignored).
In order to make the table more safe for getTableText(), you could, *after* the code above, delete every possible extra row. Example:
my ($h, $w) = $doc->getTableSize($table); for (my $i = $height ; $i < $h ; $i++) { $doc->deleteRow($table, $i); }

In addition, it's possible to remove the extra cells (if any) in each row in the normalized area, according the difference between the normalized width ($width) and the external width ($w) returned by getTableSize().
Direct Responses: 3263 | Write a response
Posted on 2006-10-15 20:28:29-07 by cgrauer in response to 3261
Re: empty cell or col bug or feature of getTableText in 2.113 dist?
I know getTable($tablename, $height, $width) - but what if I don't know height and width (as I described yesterday...)??

And getTableSize: the problem is not, to "get" the empty rows (getTableText truncates empty rows at the end anyway), but the normalization takes much time if you normalize 65536 rows.

Btw. I found that my solution from yesterday does not work properly. It was a little bit rash, to post it, sorry. But I already have a new one ;-) I don't like to patch modules either, but I don't know an other solution. So now I made the following with O::O::Text: After line 2316 I insert:
#----------------------------------------------------------------------------- # checks wether a row contains data in any cell (result:0) or not (result:1) sub _is_empty_row { my $self = shift; my $row = shift; my $cell_values = join('', $self->_get_row_content( $row ) ); if ( $cell_values =~ /\w/si ) { return 0; } else { return 1; } }

After line 2339 (now 2355) I insert:
if ( $self->{'truncate_empty_cells'} ) { while ( $self->_is_empty_row( $rows[$#rows] ) ) { pop( @rows ); } }

I Change line 2359 (now 2382) to:
if ( ( $rownum < $length ) && (not $self->{'truncate_empty_cells'}) )
Assuming that the global option 'truncate_empty_cells' set to '1'. This takes time as well, but not as much as normalizing the whole table. And in respect to yesterday's solution it works ;-)
Direct Responses: Write a response