Justus Perlwitz

Parsing S-expressions with Python 3

This is a simple s-expression parser written in Python 3. It understands symbols and numbers and uses tuples to represent the data internally.

The Tokenizer

First, the tokenizer adds padding to left and right parentheses. Then it splits the raw token stream by space characters. As a result empty items will appear, since ( will turn into (, which will be split into (, ` . That is why the surroundingfilterdiscards empty items usingbool. Finally, the tokenizer turns all numeric tokens intoint`s. It returns the resulting token stream as a list.

The Parser

The token stream can now be parsed.

A valid s-expression is either an atom (int or symbol) or a list of s-expressions. Since we operate on a token stream, the parser has to peek at the current token and then either parse a list or parse an atom.

A recursive descent parser lends itself to this type of recursive grammar.

All done!

Date created:
November 8, 2015

Back to Index

You are more than welcome to share your thoughts via email