15

I'm wondering how to effectively test a lexer (tokenizer).

The number of combinations of tokens in a source file can be huge, and the only way I've found is to make a batch of representative source files and expect an specific sequence of tokens for each of them.

SuperJMN
  • 413
  • 3
  • 9
  • 3
    Having some "golden exemplar" tests where you lex a whole file and compare the resulting tokens to the expected output might be a good idea, but I wouldn't rely solely on that - it's hard to pin down exactly where the problem. Build up more simple, unit-level test cases where you give a small input (just a line or two) to lex. – jonrsharpe Feb 04 '21 at 12:38
  • 4
    Give fuzzing a shot. – Kyslik Feb 04 '21 at 20:32

3 Answers3

16

Your grammar probably has some rules for each token on how it can be produced (for example, that a { signifies a BLOCK_START token, or that a string-literal token is delimited by '"' characters). Start writing tests for those rules and verify that your lexer produces the correct token in each case.

Once you have a test for each single token, you can add a few tests for interesting combinations of tokens. Focus here on token combinations that would reveal an error in your lexer. The token combinations don't have to make sense to a parser for your language, so it is entirely valid to use +++++12 as input and expect the tokens INCREMENT, INCREMENT, PLUS, INTEGER_LITERAL(12) as output.

And finally, make sure you have some tests for faulty inputs, where the lexer will not be able to recognize a token. And although I mention them last, they certainly don't have to be the last tests you create. You could just as well start with these.

Bart van Ingen Schenau
  • 71,712
  • 20
  • 110
  • 179
  • 6
    I find it easier to write the tests for erroneous input _first_ - quite often the easiest tests to write, and to make pass. And helps you get the error-handling side of the interface right, too (particularly helpful in C, not as much so in languages with exceptions, I guess). – Toby Speight Feb 04 '21 at 21:26
15
  • If you're writing the lexer yourself, this seems like an ideal case for test-driven development.

    While “the number of combinations of tokens in a source file can be huge,” the number of branches in your source code is finite. The idea is that before you add a feature in your code—for instance an edge case to handle by the lexer—you start by writing the test first.

  • If you're using an existing lexer that you feed with specific rules, maybe a similar approach can be applied as well. In other words, you start with the very simple syntax (which doesn't do anything useful), and add more and more tests, while complexifying the rules as well.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • 3
    I'm not convinced. Say you're adding support for quoted strings; writing a single failing test would tick the "test-driven development" box, but wouldn't lead to a good test suite. What you really want is a _suite_ of tests that specify what "correct string handling" looks like, several of which will probably pass on your first attempt. So you end up back where you started: working out how to choose what tests to write. – IMSoP Feb 04 '21 at 22:07
  • 2
    @IMSoP: you're not convinced by the first case, or the second one? For the second one, I'm not convinced either; thus the usage of *maybe* in my answer. If it's the first one, well, a lexer is an ordinary algorithm, and TDD plays nice specifically for algorithms. In your case of quoted strings, if you add **the simplest code** which makes the first unit test pass, you *won't* have a fully-working quoted strings feature; so you'll need additional tests. And then more tests, and some more. – Arseni Mourzenko Feb 04 '21 at 23:38
  • 2
    In other words, TDD is not about writing one test and then developing a whole feature. It's about writing a test, and then looking at the most elementary way to modify the code in order for the test to go from red to green. And this is crucial: if you don't do that, TDD becomes as useless as the practice of adding some random tests to an existing codebase in order to reach a given code coverage requested by the boss/tech lead. – Arseni Mourzenko Feb 04 '21 at 23:41
  • 2
    It seems to me that you're answering "what tests should I write?" with "one at a time", which isn't an answer. I understand the principle of making only the change necessary for one test, but how do you choose that test? And once it's green, how do you choose the next test? – IMSoP Feb 05 '21 at 00:16
  • 2
    @IMSoP: First think of how you want it to work. Then think of how it should fail. *Then* think of how a complete idiot could screw with it and make sure it fails in a predictable way. That's TDD --- and not just for lexers. – Greg Burghardt Feb 05 '21 at 04:00
  • I’d love to see you write a working lexer for Swift using Test driven design. With about 20,000 characters valid in identifiers, operators etc., nested comments, string interpolation, multi line strings. Have fun. – gnasher729 Feb 05 '21 at 06:29
  • @gnasher729: what makes you think that TDD doesn't work for complex code? I haven't used it on very complex logic, nor do I know anyone who worked on very complex logic either, but everything I learned about TDD indicates that there is no threshold, in terms of LOCs or function points, above which you can't use TDD. If you have any info, I'm interested. – Arseni Mourzenko Feb 05 '21 at 09:22
  • 2
    Honestly, I share @IMSoP's doubts here. As much as I love TDD, the problem with it is that, in many cases, the number of possible ways the code can break increases more than linearly (often quadratically or even exponentially) with the number of interacting features. Which means that if you blindly do the basic TDD loop of 1) add a failing test, 2) make the test not fail, 3) repeat, you may end up with poorly tested code that can be broken later without any of the tests detecting it (because you didn't add any extra tests for things that already happened to work but which someone might break). – Ilmari Karonen Feb 05 '21 at 12:59
  • @IlmariKaronen: you may share your thoughts in a [separate question](https://softwareengineering.stackexchange.com/q/421893/6605). – Arseni Mourzenko Feb 05 '21 at 13:45
  • Arseni: Just making sure that you parse every one character identifier correctly in Swift requires a few thousand test cases. – gnasher729 Feb 06 '21 at 21:31
  • @gnasher729: it is the second time you mention Swift. Do you have any example of a piece of Swift code which would be difficult to parse? Looking at [the basics](https://docs.swift.org/swift-book/LanguageGuide/TheBasics.html) as well as a few Swift projects on GitHub, the only unusual thing I see is the `if let` block. Not sure that an `if let` would complicate so much the parser. – Arseni Mourzenko Feb 06 '21 at 21:52
7

One alternative that others aren't mentioning, is to use a test generative approach—like QuickCheck from Haskell—to generate the edge cases from the grammar you've defined.

Now, once they're generated, you would then handwrite some additional conditions that you would expect to fail (e.g., assertRaises).

This would have the advantage of automatically updating as your grammar changes, reducing the time spent maintaining tests. It would also have a fun meta-sideeffect, that you would be maintaining tests for your test maintainer ;)

A T
  • 727
  • 1
  • 8
  • 16
  • 1
    +1 property-based testing and/or fuzzing is the approach I'd take too. Humans aren't particularly good at manually generating test cases. Test case generators can uncover many more bugs. The approach is fairly simple. Write a set of generators that produce a set of tokens and then generate a set of input strings from them, verify that the input set of tokens is identical to the output set. Take a set of valid input strings, apply mutations that invalidate them and verify you get the right errors. – Wes Toleman Feb 05 '21 at 03:52