Posted on 2005-03-31 09:46:02-08 by mcocmo62
Escape sequences in CSV-like lines
Hi, I'm trying to parse a CSV-like file with the following characteristics: -- SEMICOLON (;) is used to separate fields instead of COMMA (,): that's easy because I just need to adapt an example reported in the FAQs -- SEMICOLON can be used in the values provided that it is 'escaped' by a QUESTION MARK ('?'): that's the hard part of the story -- QUESTION MARK is also used for escaping QUESTION MARK itself but it is not used for 'escaping' other characters in the field (to me, this is pretty ugly but that's the format!) Please note that the parser shall handle situations like: 1) "ABC???;DE;" whose value is ABC?;DE because the first two ? are interpreted as 'escaped' ? and the sequence ?; is interpreted as 'escaped' ; 2) "ABC??;" whose vale is "ABC?" because the first ? escapes the second ? that does not escape ; 3) "ABC?DE;" whose value is "ABC?DE" because ? does not 'escape' D I found a rather clumsy solution that pre-parses char-by-char the lines before parsing them with RecDescent. It converts the escaped sequences into un-escaped sequences and converts value separators (i.e. ';') into out-of-bound characters (i.e. '\e'); afterwards RecDescent uses '\e' as value separator. It works fine but I suspect there is a better and more 'perl-wise' solution. BTW: RecDescent is great in particular because it allows a context sensitive parsing; I love it! Thanks, MCo.
Direct Responses: 286 | Write a response
Posted on 2005-04-01 12:53:33-08 by jmcnamara in response to 283
Re: Escape sequences in CSV-like lines

I think that Parse::RecDescent is the wrong tool for this. You could use Text::CSV_XS instead, with a little massaging of the input to take care of condition 3:

#!/usr/bin/perl -wl use Text::CSV_XS; use strict; my $csv = Text::CSV_XS->new({sep_char => ';', escape_char => '?'}); my $line = 'ABC???;DE;ABC??;ABC?DE'; $line =~ s/(\?[^?;])/?$1/g; $csv->parse($line); print join "|", $csv->fields(); __END__ prints: ABC?;DE|ABC?|ABC?DE

This will be signifcantly faster than a P::RD parser. You could also have a look at Text::xSV.

John.
--
Direct Responses: Write a response
Perl Weekly newsletter
A free weekly newsletter for people who are busy to read all the blogs. click here to check it out.