4

Question:

Why not just use BDD-style acceptance tests and do away with unit tests, integration tests, functional tests and all other tests?


I have been doing some research into the differences between different types of testing as well as different software development philosophies.

Looking at the list of software development philosophies and reading about different types of testing lead me onto a research to find out why we use unit tests, integration tests and functional tests if we have acceptance tests.

I hate to be the one to kill the holy cow but I feel like if you have acceptance tests in a BDD-style environment like given, when, then then you don't really need any other tests since you will be delivering exactly what functionality is required by the business anyway with just acceptance tests.


Some arguments for using unit, integration and functional tests:

  • Unit tests ensure things at a lower level are correct: Who cares? if the business is happy with what they've received, then what difference does it make?

  • Unit tests speed up long-term development time: Is it really worth the trade-off? Where there is code there are bugs, tests are not exempt from this. The time spent writing and debugging 5,000 unit tests vs. the time debugging 50 acceptance tests seems like an obvious choice?

  • Unit tests/TDD ensures good architecture: No they don't. Each unit test is a design decision and is extremely vulnerable to overall poor architecture.

  • Integration tests ensure the overall performance is accurate and optimal: Why can't this be just another Then condition in acceptance tests? Why not just use selenium within the acceptance tests?

  • Functional tests ensure that output is as expected: Acceptance tests enhance this by providing an answer at build time rather than waiting for potentially incorrect results.

  • TDD ensures that when changes are implemented, nothing else breaks in the process: Acceptance tests will ensure this too since everything's based on business requirements/features.

Noobcanon
  • 89
  • 3
  • @toniedzwiedz: I've updated the labels. I'm talking about acceptance tests within the project using a tool like cucumber or jbehave. – Noobcanon Feb 15 '15 at 21:21
  • 3
    A code path either is never used and so shouldn't be there, or is potentially used and is therefore a potential source of a bug. So if it takes 5000 unit tests to cover all code paths, how could the same be done with 50 acceptance tests? – Ben Aaronson Feb 15 '15 at 23:01
  • 2
    @BenAaronson: Managing what shouldn't be there doesn't really matter that much and is really just a waste of time and money. If you really wanted to remove what shouldn't be there then acceptance tests would ensure that you didn't remove something you shouldn't have. – Noobcanon Feb 16 '15 at 20:40
  • Yeah, the "shouldn't be there" is for completeness. Assuming code that shouldn't be there is removed, do you think there'd still be a huge discrepancy between the number of acceptance tests needed to provide full coverage and the number of unit tests needed? If so, how would you explain it? – Ben Aaronson Feb 17 '15 at 08:55
  • The code is based on the acceptance tests so as long as they don't fail then your code is good. *Given, When, Then*'s will ensure that the product owner gets what she wants every time, that's all that matters. I think that the trade-off between time saved in writing unit tests vs. time saved in not writing unit tests is highly debatable in an environment where acceptance tests are your only source of guidance. – Noobcanon Feb 17 '15 at 20:10
  • This could be a decent question if it weren't so ranty. @Noobcanon, you've got to take into consideration the workflow of a developer. Tests at a finer level than Acceptance work for the developer. – MetaFight Feb 17 '15 at 21:30
  • @MetaFight: You could always provide an edit to make it less ranty? – Noobcanon Feb 18 '15 at 19:55
  • I could, but so could you ;) – MetaFight Feb 18 '15 at 21:03
  • At the core, this is a very good question. I just disagree on the following points: 1) if you want to achieve something near 100% code coverage, it's naive to think that a significantly smaller number of acceptance tests will do (it's the cyclomatic complexity of the codebase that matters, not the kinds of tests you write); 2) technically, "acceptance" tests are no different from integration or functional tests; it's just that they are written or verified by end users of the software (good luck getting them to do that), or simply written from the perspective of what delivers "business value". – Rogério Feb 23 '15 at 00:38
  • 3
    [My gorilla is better than your shark](http://blog.stackoverflow.com/2011/08/gorilla-vs-shark/). –  Feb 24 '15 at 05:44
  • When computing the time it takes to write unit tests, consider that programmers _will test_ any code they've just written -- for instance by manually refreshing a browser page they're looking at. Turning that into a short bit of code that calls the function instead (a "unit test") is often _easier_ than setting up some elaborate situation in the app where the new code can be tested by hand, and it can be checked into source control and repeated automatically from then on. – RemcoGerlich Feb 24 '15 at 10:20

3 Answers3

12

Acceptance tests act at a very different level than, say, unit tests.

A unit test is very precise: it deals with one method, sometimes a part of a method. This makes it a perfect choice for regression testing. You make a change. A test fails while it passed during the previous commit. Great, you can easily pinpoint the source of the regression both in time (from commit N-1 to commit N) and space (this method of this class).

With acceptance tests, good luck if some start to fail. One bad change may cause one acceptance test to fail, or maybe ten tests, or a hundred. When you look at those hundred tests turning red without giving any hint about the location of the bug, the only thing which comes to mind is to revert to previous commit and start over.

Imagine acceptance tests as tests which consider the system as a black-box and which test a feature of the system. The feature may involve thousands of methods to be run, may rely on a database, a message queue service, a few dozen of other things. An acceptance test doesn't care how much stuff is being involved, and don't all the magic which happens behind the scenes. This also means that multiple acceptance tests may rely on the same method, which in turn means that a regression in a single method often leads to several failed acceptance tests. I had cases where a regression caused suddenly approximately fifty system and acceptance tests to fail, without giving any hint about the location of the bug.

Unit tests consider the system as a white-box. They are aware of the concrete implementations, and test a specific method, not a feature. By using mocks and stubs, unit tests achieve enough isolation to not being affected by the world: if the method works, the tests succeed. If the method has a regression, those tests fail. If another method somewhere in the code base doesn't work, those unit tests still pass.

Imagine the following diagram which represents methods calling other methods:

          enter image description here

Unit testing can be represented this way:

          enter image description here

For example, unit tests U3 and U3′ are not affected by bugs in method 6, because it is replaced by a stub. Nor is it affected by the regressions in method 2, because the stub 5 doesn't rely on method 2. In the same way, U8″ doesn't care about method 7, because 9 is a stub and doesn't rely on 11 which in turn uses 7.

Imagine you make a commit and your CI informs you that U8′ now fails. Where would you search for the problem?

  • In U8′. Maybe the code it tests is correct, but the test is not. This happens, for instance, when requirements change: you change the code but forget to reflect the change in the tests.

  • In method 8. There could be a regression.

  • In stub 9. Maybe you implemented a change in code and in tests, but forgot to change the stubs.

On the other hand, you are sure the problem is not with method 11 or method 7.

Now this is how acceptance or system tests would look like:

          enter image description here

          enter image description here

Imagine you make a commit and your CI informs you that A/S2 failed. Guess where is the problem? Much more difficult, isn't it.

Obviously, another aspect is that, as I described above, a small change can cause many acceptance tests to fail. For instance, a regression in method 7 may lead to A/S1 and A/S2 to fail. With unit tests, a regression in method 11 may cause U11 to fail, but will not affect, for instance, U8.

This leads to a huge benefit: the time you spend locating regressions. I've seen programmers who, when a regression is found, simply revert the source to the latest working commit and start over, because the code is a mess, and they don't enjoy spending hours debugging, hoping to find the origin of the problem. This is unfortunate, especially when commits are done not as frequently as they can be.

With unit tests, you don't waste all this time. They simply tell you that you have an issue with a given method in a given class, so you can focus your attention to the concerned method right after you discover the regression.

  • Unit tests ensure things at a lower level are correct: Who cares? if the business is happy with what they've received, then what difference does it make? [...]

Have you worked with really bad projects where you can't make a change without breaking at least ten things in random locations?

Unit tests don't magically solve this problem. However, you should care that things are correct at lower level. Quick hacks have a substantial maintenance costs. On the other hand, if the only thing programmers have are acceptance tests (and tight deadlines; and no insensitive to do their job correctly), when an acceptance test fails because somewhere, the font should be 12px, but appears to be 10px, they may eventually just end up writing:

this.font = 12;

and wait until testers get back to them telling that now, the font is 12px in situations where it should be 14px.

  • Unit tests speed up long-term development time: Is it really worth the trade-off? Where there is code there are bugs, tests are not exempt from this. [...]

Tests won't magically make bugs go away. However, practice shows that writing a test, checking that it fails, then implementing the feature and checking that the test passes appeared to be a practice which generally leads to less bugs.

Similarly, why would anyone do code reviews? Reviewers are not exempt from lack of attention, and there are plenty of cases where a bug is missed by several reviewers and maintainers. This being said, you get more bugs without code reviews than with them.

Is it worth it? For your personal small app, not really. For business-critical code, yes, indeed.

  • Unit tests ensure good architecture: No they don't. [...]

They do, somehow. By forcing to test methods in isolation, you force programmers to rethink coupling. Unit testing usually leads to short methods, classes which do one and one only thing (Single responsibility principle), dependency injection, etc.

A 4000 LOC method which relies on a few hundred other methods and requires access to the database is impossible to unit test (while it is perfectly normal to add acceptance tests to such method).

In Going TDD in the middle of the project, I describe specifically this sort of projects. Bad architecture led to the case where it was practically impossible to add unit tests later. If unit tests were considered from the beginning, it would reduce the disaster. The code would still be bad (since written by the same programmers who were fine writing a 400 LOC spaghetti method), but not that bad.

  • Integration tests ensure the overall performance is accurate and optimal [...]

Don't know. Never heard of that. Performance should be checked by tests corresponding to the performance non-functional requirements.

  • Functional tests ensure that output is as expected [...]

Don't know.

  • TDD ensures that when changes are implemented, nothing else breaks in the process: Acceptance tests will ensure this too since everything's based on business requirements/features. [...]

No, this is the role of regression testing. TDD's purpose is mainly to avoid writing tests which are not testing anything. This is why in TDD, the test should fail before you implement a feature: otherwise, either the test is wrong, or you actually don't need the feature.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • 1
    Again, is it really worth the trade-off though? The time spent on debugging vs. the time spent on managing unit tests (even using TDD) is questionable. The font-size 12/14 scenario wouldn't really be an issue since that could all fall under the feature in BDD and could easily be updated. Much easier than TDD if you think about changing 5 acceptance tests vs. changing hundreds of other tests. Many TDD projects have failed for architectural reasons, even ones that used DI frameworks, etc. Look at `Disciplined Agile Delivery` with the new role of an architecture owner. – Noobcanon Feb 15 '15 at 22:25
  • 1
    @Noobcanon: unfortunately, I don't know any studies showing that the trade-off is or isn't worth it and in what cases. According to my experience, (1) time wasted searching for bugs largely outperforms time spent writing unit tests on all but small projects and (2) projects having tests are *always* in a better shape than those which don't. This being said, the point 1 is just my personal observation and the point 2 is logical: programmers who don't care about their work will write crappy code and won't test anything, and programmers who care will write good code in the first place. – Arseni Mourzenko Feb 15 '15 at 22:50
  • Acceptance Tests will still guide you in the general direction of the bug. If it still takes you a long time to figure it out with that level of a hint then it could potentially be a training issue. – Noobcanon Feb 16 '15 at 20:41
  • @Noobcanon: The problem with acceptance tests is that in order to figure the location of the issue which causes them to fail, you need to debug. A single failed unit test shows you precisely the problematic method, which means that you don't need debugging to search for the location of the bug. – Arseni Mourzenko Feb 16 '15 at 21:01
  • It's not a given that a unit test == immediate problem indication and resolution simply because generally there will be a lot of failed unit tests at a time so it still takes time to investigate the exact cause. This is no different to the kind of general indication an acceptance test will provide. – Noobcanon Feb 16 '15 at 21:08
  • 1
    I challenge your assertion that a bug will result in "a lot of failed unit tests at a time." If that's the case, then you're not writing your unit tests correctly. One bug should translate to one (maybe two) *unit* tests failing. – MetaFight Feb 16 '15 at 22:11
  • @MetaFight: That's not necessarily true. On very large systems developers won't have time to go through every single unit test to ensure that the tests they create don't duplicate **in any way** what already exists anywhere. As time passes this will result in more and more unit tests containing various responses to various parts of the system. It's been proven many times that TDD does not guarantee good design simply because each test is a design decision and that could be a good or bad decision. – Noobcanon Feb 17 '15 at 20:01
  • 1
    @Noobcanon: it looks like you are talking about systems build with no separation of concerns in mind. On the other hand, clean architecture with clearly separated classes leads to narrow-focused unit tests. TDD encourages specifically this separation; you end up with mocks and stubs which leads to a situation where a regression affecting a class is not propagated to other classes and cannot cause other classes' unit tests to fail. – Arseni Mourzenko Feb 17 '15 at 20:39
  • @Noobcanon A system being large has *nothing* to do with the amount of time a developer has to write unit tests. Like MainMa mentioned, if the system has good SoC, then unit tests won't overlap significantly. And the mere passage of time won't change that. And I never mentioned that writing proper unit tests guaranteed a good design. However, it does foster good SoC and adherence to the SRP. – MetaFight Feb 17 '15 at 21:27
  • 1
    MainMa, MetaFight: It's not realistic to make the assumption that all systems will be built with separations of concerns in mind. More often than not, technical debt is created to speed up development time and as things are built on top of technical debt, things get messier - especially unit tests. Basically, I just don't see the point. If you have acceptance tests and these are helping you to debug and develop quickly and the product owner is happy then what's the benefit? I believe that acceptance tests will keep the code cleaner in a real life situation anyway. – Noobcanon Feb 18 '15 at 20:10
  • @Noobcanon: so according to you, unit tests are useless in general because they don't work well for spaghetti code? If you build a mess with no separation of concerns in mind, the benefit (or the lack) of having unit tests shouldn't be your primary concern. But in this case, your original question should be rephrased like, say: “Is there any good in unit tests in a system with high technical debt and no separation of concerns, given that the system is already covered by acceptance tests?” – Arseni Mourzenko Feb 18 '15 at 20:14
  • @MainMa: Your separation of concerns and loosely coupled architecture is based on well structured features. Having a product owner and an architecture owner work together will result in useful acceptance tests and clean, maintainable code. – Noobcanon Feb 18 '15 at 20:14
  • @Noobcanon: note once again that TDD would have prevented (or mitigated the risk of) getting to the spaghetti state in the first place. When you do have proper unit testing and care about it, it's difficult to lose the original separation of concerns. – Arseni Mourzenko Feb 18 '15 at 20:14
  • @MainMa: Basically, with unit tests it's difficult to tie which unit tests or which parts of unit tests relate to which part of which feature. Whereas with acceptance tests they relate directly so therefore changes are immeasurably more welcome. – Noobcanon Feb 18 '15 at 20:16
  • @Noobcanon Try working at Enterprise scale. All the same spaghetti patterns replicated with architecture: no clear golden sources of data, two systems doing the same thing because you bought that other company last week, fifty clients all interpreting your web API in different ways, those three clients who refuse to upgrade to the latest version so you have to support both... acceptance testing doesn't solve all the problems either. Sometimes it can be useful to add some examples to show how lower levels behave too, so that you can unpick these kind of inevitable messes. – Lunivore Feb 24 '15 at 00:50
  • Your first sentence isn't completely correct. [Martin Fowler says](http://martinfowler.com/bliki/UnitTest.html) "I often take a bunch of closely related classes and treat them as a single unit". In the area of unit testing, I believe him before you :) – gbjbaanb Feb 24 '15 at 08:51
  • Sorry for the off topic, what tool did you use to draw the graphs? – raisercostin Sep 10 '18 at 13:30
  • @raisercostin: I don't remember, and I can't find the originals. My guess would be either draw.io or Adobe Illustrator. In both apps, this can be done pretty easily. If you want to discuss further the diagrams, my e-mail address is in the description in my profile. – Arseni Mourzenko Sep 10 '18 at 20:30
2

There are several testing-related questions:

  • Which parts to test? Units (unit testing), whole system (functional), several systems together (integration)
  • Which aspects to test? Functionality, performance, reliability, security, etc.
  • When to do testing? Before (TDD/BDD) or after (non-TDD/BDD). Or never =)
  • How deep to test? Or, how to view the system under test? "Black box", "grey box", "white box" approaches.

Also, TDD and BDD are the ways to design software and test it. Two stones at one bird!


Examples:

  • I have a REST API. I write tests using Cucumber, viewing system as a black box, and I have a full spec which guides my testing process. This way I'm doing acceptance testing, but not BDD, since I do not design through testing. Design is in the spec.
  • I have some rocket science software, I need to test a bunch of functions already written in Haskell. I write unit tests, but not TDD. That could be TDD if I'd write tests upfront.
  • I want to write yet another DOM manipulation framework in JS. I use Jasmine framework to write tests. I test the framework as a black box. In this case I do BDD. And also it is functional testing.

Also I have some notes and clarifications about what you state:

  • Unit tests ensure things at a lower level are correct

    Kinda, but not really. They just show the lack/presence of known bugs. If you're interested in correctness, you may want to look at formal verification.

  • Unit tests speed up long-term development time

    Not always, because poorly-designed systems can be hard or impossible to test. In such cases I believe it is not worth it. That's why TDD was invented.

    If system is testable, it does worth a lot. Errors cost much less when caught early. They may cost too much then they are discovered in production phase.

  • Unit tests ensure good architecture: No they don't.

    Indeed they don't. TDD and BDD do.

  • Integration tests ensure the overall performance is accurate and optimal

    Not only. Functionality and reliability too. Two systems may pass their "local" tests and yet fail to work together. Integration testing shows how well parts work together.

  • TDD ensures that when changes are implemented, nothing else breaks in the process

    Secretly, all types of automated testing do that. Showing "regression bugs" that is: module (or function, or system) A has tests over it, then you change module B, then A breaks, then you see it through the tests on module A and fix module A.

scriptin
  • 4,432
  • 21
  • 32
2

BDD actually started at the class level. JBehave was originally intended to be a replacement for JUnit.

The only meaningful difference between JBehave and JUnit back in 2004 was the removal of the word "test", and the use of "should" to drive out different aspects of behaviour of the class and encourage questioning of those aspects of behaviour ("should it?").

Conversations between Dan North (who created BDD) and Chris Matts (at the time, an analyst who was learning more about code and particularly mocks) revealed that the same patterns could be used at a system level, and the idea of using "Given, When, Then" to automate reusable steps was born.

Rather than thinking of them as "tests", Dan encouraged developers to think of them as examples of how a class behaved, or how a system behaved. We called the system ones "scenarios" mostly to differentiate from the class-level "examples", but both are synonyms.

The steps were created according to the context in which the class or system worked (the Given) and how those contexts changed the outcome (the Then). This pattern let people question whether the contexts needed to be considered, or whether an outcome was desired or not.

The different levels are appropriate for different audiences. Conversations about class behaviour typically take place between developers, while conversations about system behaviour involve the "three amigos" of tester, developer and business expert.

It's often the case that developers are aware of more requirements than the business, from stakeholders who aren't always present. For instance, devs will normally consider maintainability, performance, the APIs of third-party or legacy systems, themselves coming back to the codebase in 3 years time or new joiners coming on board, etc.

All of these are parts of the requirements which the business often don't care about, as long as it works. Talking through the behaviour at a technical level can give the devs a chance to question whether what they're doing is appropriate.

Some small projects may not see much benefit from this, particularly if the functionality of the project is simple and predictable. Large projects benefit from the "Test Pyramid": few full-stack system tests, more integration tests, and even more class-level tests. See Lisa Crispin and Janet Gregory's "Agile Testing" for more information.

As an example, this is a class-level test which uses "Given, When, Then" in comments to illustrate the waiter's responsibilities and behaviour (C# but readable).

I find it helps if I don't think of them as tests, though. I think of them as examples which illustrate how the system behaves. The idea isn't to catch bugs; it's to prevent bugs from scenarios which haven't been considered from being there in the first place. Any regression bug is usually a symptom of poor design, and benefits from class-level refactoring and testing instead of shoving another system-level scenario into the mix.

Lunivore
  • 4,232
  • 1
  • 18
  • 23
  • 1
    I agree. I'd just like to add that perhaps developers being more aware of many technical requirements is a symptom of over-complicated system design and that BDD could be more than just tests, it could promote a simpler, more straight-forward requirement/development cycle. – ThreaT Feb 24 '15 at 20:34
  • 1
    @ThreaT I want it to be that. I try to actually treat things like performance and scalability as requirements, and monitor wherever we can't test: http://lizkeogh.com/2014/02/10/discrete-vs-continuous-capabilities/ - deliver the monitoring using the same principles. This helps devs know exactly what they need to achieve so they don't over-engineer or have to hack it in at the last minute. Tom Gilb has a lot of stuff in his work on Evo on quantifying functionality like this more precisely too. I think devs are more aware because historically they had to be. – Lunivore Feb 24 '15 at 20:47
  • I really appreciate where you're going with this. I believe this is the correct answer to the question. It would be great if there was more information available somewhere on some techniques that can be used to push things in this direction. If you have any more useful resources for this purpose, perhaps you could add them to your question? – ThreaT Feb 25 '15 at 12:16
  • 1
    @ThreaT I could, but it would be a bit self-serving since most of them are on my blog: http://lizkeogh.com/category/bdd/ - try also any answers I've given here to StackOverflow BDD questions, or on http://programmers.stackexchange.com. Older stuff tends to be more code-focused than newer. And thank you for the kind feedback! – Lunivore Feb 25 '15 at 15:07