Saturday 26 December 2009

Erlang CSV Parser

At long last I've finished the CSV parser I've been working on.  Most of the technical problems have been string matching problems, and that a string is a list of integers that can be matched singly prefixing a dollar sign to the char that should be matched.

The parser uses a state machine implemented by using a process and messages the next char until it reaches the end of the file/string at which point it messages an eof atom and awaits the process to message back the parsed CSV.  In the end the parser used quite a lot of erlang's features including processes, funs and parametrised macros and the end result was pretty clean.  It can take a plain string or an IO device such as a file as the string source which is handled in a nice way using funs to get the next char.  I found the switch from OOP to functional confusing at first since I wanted to use an input stream but the functional method I discovered is probably smaller than the Java stream based approach I would have used otherwise.

Other notable CSV parsers include ppolv's and an FSM OTP behaviour from Praveen Ray of Yellowfish.  I'm really impressed by the OTP behaviour, I can imagine this would improve reuse once comfortable with erlang and the OTP.

No comments: