35

Should I write unit tests for complex regular expressions in my application?

  • On the one hand: they are easy to test because input and output format is often simple and well-defined, and they can often become so complex so tests of them specifically are valuable.
  • On the other hand: they themselves are seldom part of the interface of some unit. It might be better to only test the interface and do that in a way that implicitly tests the regexes.

EDIT:

I agree with Doc Brown who in his comment notes that this is a special case of unit testing of internal components.

But as internal components regexes have a few special characteristics:

  1. A single line regex can be really complex without really being a separate module.
  2. Regexes map input to output without any side effects and hence are really easy to test separately.
Lii
  • 461
  • 4
  • 7
  • 12
    "they themselves are seldom part of the interface of some unit." - if your classes have interesting code buried deep under the interface, break up your classes. This is an example of how thinking about tess can improve design. – Nathan Cooper May 07 '16 at 11:38
  • 3
    The same question in a more general manner: which internal components should be unit tested? See http://programmers.stackexchange.com/questions/16732/unit-testing-internal-components – Doc Brown May 07 '16 at 12:08
  • Sorta related, see Regex101. They have a section to write unit tests for your regex.For example: https://regex101.com/r/tR3mJ2/2 – David says Reinstate Monica May 07 '16 at 19:37
  • 3
    Disclaimer - this comment is my humble opinion: **1** first of all I believe that the complex regexps are pure evil - also see http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ **2** real value of testing such expressions comes when you test them over a large database of real data http://blog.codinghorror.com/testing-with-the-force/ **3** I have a strange feeling that these tests are not *unit* tests exactly – Boris Treukhov May 07 '16 at 21:23

6 Answers6

102

Testing dogmatism aside, the real question is whether it provides value to unit test complex regular expressions. It seems pretty clear that it does provide value (regardless of whether the regex is part of a public interface) if the regex is complex enough, since it allows you to find and reproduce bugs and prevent against regressions.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • 25
    +1, though if a regular expression is complex enough that this is an issue, then it probably makes sense to move it into a "wrapper" unit with appropriate methods (`isValid`, `parse`, `tryParse`, or whatnot, depending exactly how it's being used), so that the client code doesn't have to know that it's currently implemented using a regex. The wrapper unit would then have detailed tests, which -- again -- wouldn't need to know the current implementation. These tests, of course, are *de facto* testing the regex, but in an implementation-agnostic way. – ruakh May 07 '16 at 20:51
  • 1
    A reg ex is a program, though in a specialized and very terse language. As such, testing is appropriate for nontrivial expressions ... And certainly the code which is invoking the expression should be tested, which may implicitly test the reserved. – keshlam May 07 '16 at 22:37
  • 6
    @ruakh Well said. The benefit to a wrapper class for a regex is that you can neatly replace it with ordinary code if that becomes necessary. Code with complex input/output should always have unit testing, because it is remarkably difficult to debug without. If you need to refer to documentation to understand the code's effects, it should have unit tests. If it's just a quick 1:1 mapping like type conversion, then there's no problem. Regexes get past that point of requiring docs very quickly. – Aaron3468 May 08 '16 at 02:09
  • @JacquesB: Although I think I agree with you I also think this answer would be better if you make more of an argument for why *regexes* should deserve any special treatment, such as tests for them even if they are not part of a unit interface. Why does such test provide value for regexes but not for other internal things? – Lii May 08 '16 at 15:36
  • 4
    @Lii: Regexes does not deserve any special treatment. The regex is the unit in this case, so we unit-test it. – JacquesB May 08 '16 at 15:50
  • 1
    @ruakh I was about to write an answer to that effect. I agree that using regex is an implementation detail. What matters is that things validate when they're supposed to, and fail to validate when they're supposed to. Test the `FooValidator` for its inputs and outputs, then you've no concern over *how* it's being done. ++ – RubberDuck May 08 '16 at 16:10
  • 1
    Unit testing regex is also a good way to show the cases where it should work and the cases where it shouldn't work. So when the next developer comes along and see the regex, he can just have a look at the unit tests to get an idea of what it does. – dyesdyes May 08 '16 at 22:29
  • 1
    @RubberDuck: But everything is an implementation detail at some level. I dont see any good reason not to test a complex regex as a unit. – JacquesB May 08 '16 at 22:52
21

Regex can be a powerful tool, but it is not a tool you can trust to just still work if you make even minor changes to complex regexes.

So create lots of tests that documents the cases that it should cover. And create lots of tests that documents cases it should fail, if it is used for validation.

Whenever you need to change your regexes you add the new cases as tests, modify your regex and hope for the best.

If I were in an organization that in general didn't use unit tests, I would still write a test program that would test any regex we'd use. I would even do it on my own time if I had to, my hair does not need to lose any more colour.

Bent
  • 2,566
  • 1
  • 14
  • 18
3

Regular expressions are code along with the rest of your application. You should test that the code overall does what you expect it to do. This has several purposes:

  • Test are runnable documentation. It clearly demonstrates what you need the code to do. If it is tested it is important.
  • Future maintainers can be certain that if they modify it, the tests will ensure that the behavior is unchanged.

As there is an extra hurdle to overcome by having code in a different language embedded with the rest, you most likely should give this extra attention for the benefit of maintenance.

1

In short, you should test your application, period. Whether you test your regex with automated tests that run it in isolation, as part of a bigger black box or if you just fiddle around with it by hand is secondary to the point that you need to make sure it works.

The main advantage of unit tests is that they save time. They let you test the thing as many times as you like now or at any point in the future. If there's any reason at all to believe that your regex will at any point be refactored, tweaked, get more constraints etc, then yeah, you probably want some regression tests for it, or when you do change it, you'll have to go through an hour of thinking through all edge cases so you didn't break it. That, or you learn to live with being scared of your code and simply never change it.

sara
  • 2,549
  • 15
  • 23
  • 3
    A rule of thumb I've come to realize; if I needed docs to write and inspect the code, then I will need a unit test. They've saved me many headaches, catching null pointers, none types, and incorrect output. They also give the end user the ability to repair your code to spec with minimal effort when it inevitably breaks. – Aaron3468 May 08 '16 at 02:15
-1

On the other hand: they themselves are seldom part of the interface of some unit. It might be better to only test the interface and do that in a way that implicitly tests the regexes.

I think with this you answered it yourself. Regexes in a unit are most likely an implementation detail.

What goes for testing your SQL probably also goes for regexes. When you change a piece of SQL, you probably run it through some SQL client by hand to see if it yields what you expect. The same goes for when I change a regex I use some regex tool with some sample input to see if it does what I expect.

What I find useful is a comment near the regex with a sample of text which it should match.

Jonathan Leffler
  • 1,846
  • 14
  • 21
Christiaan
  • 158
  • 5
  • "_When you change a piece of SQL you probably run it trough some SQL client by hand to see if it yields what you expect._" But this kind of answers the question in the other way... If I need or think it's useful to test the regexes by hand then I should make a unit test for that instead. Exactly this is what makes it a tricky thing to decide! – Lii May 07 '16 at 11:05
  • It really depends. What you want your unit tests for is the ability to make changes. How often do you change a specific regex? If the answer is often then by all means create a test for it. – Christiaan May 07 '16 at 11:08
  • If the regex is part of a bigger whole and it is difficult to test you can always extract the regex into its own module/function and write tests for that module/function/unit. – Christiaan May 07 '16 at 11:10
  • 8
    All other things being equal, it's better to have an automated test than a "test by hand." – Robert Harvey May 07 '16 at 13:46
  • 1
    Why would you not test a regex using automation? – Tony Ennis May 07 '16 at 19:55
  • 1
    It is part of a method and all I was trying to say is that there is no need to specifically test the regex if you already test that method. But if you do you are probably better off extracting the regex into a separate function that you test in isolation. – Christiaan May 09 '16 at 11:29
-5

If you have to ask, the answer is yes.

Suppose some FNG comes along and thinks he can "improve" your regex. Now, he's a FNG, so automatically an idiot. Exactly the kind of person who should not touch your precious code under any circumstances, ever! But maybe he's related to the PHB or something, so there's nothing you can do.

Except you know the PHB is going to drag you kicking and screaming back to this project to "maybe give the guy some pointers about how you made this mess" when everything goes bad. So you write down all the cases that you have carefully considered when building your beautiful masterwork of expressiondom.

And since you've written them all down, you're two-thirds of the way to having a set of test cases, since - let's face it - regex test cases are dead easy to run once you've got the framework built.

So now, you have a set of edge conditions, alternatives, and expected results. And suddenly the test cases are the documentation just as promised in all those me-too Agile blog posts. You just point out to the FNG that if his "improvement" doesn't pass the existing test cases, it's not much of an improvement, is it? And where are his proposed new test cases that demonstrate some problem with the original code, which since it works he doesn't need to be modifying, ever!!!

aghast
  • 117
  • 3
  • 3
    what is FNG? This does not seem a bad answer to me, but missing definition for FNG (googlin for it just givesn results that are unrelated, so maybe this answer was just downvoted because of FNG?) – CoffeDeveloper May 09 '16 at 08:30
  • 1
    I suspect that Google took you to the right place. ;-) (https://en.wikipedia.org/wiki/FNG_syndrome) – aghast May 09 '16 at 15:41
  • Unless you are an absolute programming genius, there will be more experienced programmers considering what you do like you look at the new guy. You may want to consider being more humble. – Thorbjørn Ravn Andersen May 21 '16 at 22:05