[TIP] About testing Pygments lexers
Tim Hatch
tim at timhatch.com
Wed Aug 25 10:46:22 PDT 2010
On 8/25/10 9:47 AM, Olemis Lang wrote:
> Hello , I don't recall if I already asked this but I'm looking for a
> test lib able to detect if a Pygments lexer (e.g. using Regex lexer
> rules) will be stuck
By "stuck" do you mean an infinite loop, or an error token? Off the top
of my head, the only case where an infinite loop can happen is if a
regex can match the empty string [although empty string to enter a new
state is used by a couple of lexers, and is probably safe].
> and also able to generate state & rule coverage
> by running tests
The best place to do this would be inside RegexLexer's
get_tokens_unprocessed method. Although the code hasn't changed since
2007, IMO the best way to add coverage would be to replace the match
function ('rexmatch' there, self._tokens[...][...][0] more generally)
with your own function using a subclass:
def __init__(self, *args, **kwargs):
super(RegexLexer, self).__init__()
def do_patch(self):
def record(s, i, rexmatch):
def closure(text):
self.covered[s][i] = True
return rexmatch(text)
return closure
self.covered = {}
for statename, regexes in self._tokens.iteritems():
self.covered[statename] = dict.fromkeys(range(len(regexes)))
for i in range(len(regexes)):
regexes[i][0] = record(statename, i, regexes[i][0])
def missed_coverage(self):
def shorten(s):
if len(s) < 20:
return s
return s[:20] + '...'
missed = 0
for statename in sorted(self._tokens):
print "State", statename
regexes = self._tokens[statename]
for i in range(len(regexes)):
if not self.covered[statename][i]:
print "Missed", i, shorten(self.tokens[statename][i][0])
missed += 1
return missed
Your test function could do something like this (assuming stdout
capture, like nose does):
def test_lexer():
text = "lambda x: 1+1\n"
lex = PythonLexer()
# ... potentially monkeypatch
x = list(lex.get_tokens_unprocessed(text))
assert x.missed_coverage() == 0
Tim
More information about the testing-in-python
mailing list