Regenerate parser tables in PLY

Parsers are difficult in general to develop using TDD, because there is too much of a tendency to specify a complete grammar. Worse, the tests you want to write early – did my tokenizer successfully recognize those numbers? – tend to become invalid as you extend the grammar, adding more requirements about the structure of the input translation units.

I am developing a parser in Python using the excellent PLY module. PLY, and some other parser generators, offers a way around this. You can specify the *start rule* that you wish to used with the parser. This means that, in theory, you can run your simple expression tests with a start rule of ‘expression’ and not worry about a semicolon at the end, or declaring a module, or whatever other structural bits you add later.

But PLY generates parser tables and code when the first invocation is made. That is, when the first test case runs, it will generate the parsertab.py file with the tables configured for whatever start rule you may have specified. Any subsequent invocation will re-use the parsertab.py file, instead of generating a new one.

Here’s some code I wrote to get around this problem. I cache the generated parsers based on the start rule. The assumption is that the generated parsers will be the same as long as the start rules are the same. This may not be true for you – if you are passing additional arguments to the parser generator, you may need to somehow include those arguments as part of the key value.

Nonetheless, here you go:

	def assertParseAst( self, text, matcher, **kwargs ) :

		# Keep a cache of parsers we have generated (start rule changes)
		start_rule = kwargs[ 'start' ] if 'start' in kwargs else close.parser.start

		if start_rule not in  self._Parser_cache :
			if os.path.exists( 'parsetab.py' ) : 	os.remove( 'parsetab.py' )
			if os.path.exists( 'parser.out' ) : 		os.remove( 'parser.out' )
			print( "Generating parser with start rule =", start_rule )
			self._Parser_cache[ start_rule ] = close.parser.parser( **kwargs )

		self.parser = self._Parser_cache[ start_rule ]

Leave a Reply