Why is white box testing discouraged in OOP?

Question

It seems the general consensus for unit testing classes is to test your object through its public interface only. So if you wanted to test the removeElement method on a LinkedList class you'd need to call addElement, then removeElement, and lastly containsElement to assert the element was removed.

This is brittle because if addElement or containsElement broke, then the test will fail even if the implementation of removeElement is correct.

When I test standalone procedures I try to call them in isolation. If I were to test a removeElement procedure I would build up the state of the parameters directly in the test and then assert their state post call is correct.

The only difference between a method and a procedure is that a method is implicitly given the object as a parameter. So since list.removeElement(el) and removeElement(list, el) are functionally the same, why not test them the same way? e.g. In the test, create an instance of the LinkedList class, setup up its "private" fields, call removeElement, and assert its fields post-call changed correctly.

This is the ideal unit test because its about taking input and asserting output for a single unit of functionality. Having to call public methods A, B, C, D, E, and F just to test method G is a borderline integration test, can potentially create false positives (since the data itself is never validated), and makes isolating the failure of the test during maintenance more difficult.

Anecdotally I've found that black box testing tempts developers to add unnecessary public methods to make their testing "easier" but increases maintenance in the long run.

So my question is why is white box testing discouraged in the OO world when it seems like common sense in the procedural and functional worlds?

EDIT: Is there an OO way of dealing with the grips I've outlined in my post, specifically in the later half that do not involve adding new public methods and avoid calling public methods other than the one being tested? Consider the dilemma of asserting the "previous" node pointer in my comment.

EDIT #2: Apparently my concept of a class might be different from others. A class to me is just a (very old) design pattern: "construct", "consume" (e.g. call methods), and "destruct" which is no different than fopen, fread, fwrite fseek, and fclose in C. Regardless of whether there is an implicit parameter involved, things are stuffed behind a namespace, or you call it private, public, or protected everything is just data and data transformations at the end of the day. I'm having trouble grasping classes as a unit when it seems more like a design pattern or even "container" for the actual units which are the functions themselves.

"Anecdotally I've found that black box testing tempts developers to add unnecessary public methods to make their testing "easier" ...." Isn't this exactly what you are suggesting by "setting up private fields"? — Euphoric, Jun 18 '17 at 18:12
And I don't think accessing "encapsulated" fields in structures is common sense in "procedural" style and it is not a problem with immutable structures in functional style. — Euphoric, Jun 18 '17 at 18:19
Thanks for the response @Euphoric! There is a difference between the team who owns the code accessing the internals for testing versus a consumer having access to new public methods that were added to make the teams testing easier. The former doesn't affect the consumer whereas the later does by expanding the public API. — , Jun 18 '17 at 19:02
@Euphoric You should be able to test functions one-at-at-time regardless of whether its a procedural or functional language. Immutability makes it easier in a functional language however a complex procedure that's difficult to test by itself can always be split up - much like a complex class can be split up. — , Jun 18 '17 at 19:08
And how would you - in your approach - make sure that you set up the object correctly and the unit test is not failing due to an error in your setup? Furthermore, in your example of removeElement, you'd probably replicate a lot of the code you wrote for addElement in the setup of your test. — MikeMB, Jun 18 '17 at 19:09
@MikeMB Great question. There may be some duplication, however, that can be cut down with helper functions written with the tests. You can also split up a function if it becomes unwieldy - much like a class can be split up. This is a gray area because now your modifying (what potentially might be) the public API much like an OO developer might add new methods just to make testing easier. — , Jun 18 '17 at 19:18
*"You should be able to test functions one-at-at-time"*. I'd actually call that a (widely spread) antipattern that is impossible to achieve anyway, because in each test you will at least call a constructor. Also it generally doesn't make sense to test functions that are supposed to work with each other only in isolation (think e.g. about push and pop). Unit tests are not a replacement for a debugger. — MikeMB, Jun 18 '17 at 19:18
@MikeMB It shouldn't be impossible to achieve. Functions take input and produce output - in OOP a method is the same as a function except its given an implicit `self` parameter which can be viewed no different from any other parameter. As far as constructors go you'd have to change the "inputs" after the constructor is called. If you exclusively test the interface you end up with the problems I describe in my original post. — , Jun 18 '17 at 19:28
I think this boils down to question of encapsulation. You seem to believe that it should be fine for internals of class to be exposed to test, which I highly disagree with. As it would make refactoring harder and would require future maintainers of the tests to know not only API of the class, but also it's internals. — Euphoric, Jun 18 '17 at 19:32
@Euphoric Can you help me understand how you'd deal with the gripes I've outlined in my posts without accessing any private state, methods, or adding new public methods? Ultimately I'd like an OO solution for calling a method in isolation and verifying the result is correct - if that's possible - because otherwise I don't see how you'd avoid the problems I've outlined so far. Thanks again for your help and viewpoints. — , Jun 18 '17 at 19:59
I don't consider them gripes. And even if I would, they would be less worrisome than leaking privates outside of class. Also, I don't consider it is possible to isolate method itself. Unit test in OOP is a class, not a method. And single unit test tests single behavior of class, not single method. And behavior of class might require calls of multiple methods. — Euphoric, Jun 18 '17 at 20:02
In the "previous pointer" case, are you saying that it is impossible to assert the private state using only public methods? Or are you saying it is possible, but it is so complex (and requires calls of multiple public methods multiple times), that it is better to just access the private state directly? — Euphoric, Jun 18 '17 at 20:18
@Euphoric A class to me is just a (very old) design pattern: "construct", "consume" (e.g. call methods), and "destruct" which is no different than `fopen`, `fread, fwrite fseek`, and `fclose` in C. Regardless of whether there is an implicit parameter involved, things are stuffed behind a namespace, or you call it private, public, or protected everything is just data and data transformations at the end of the day. I'm having trouble grasping classes as a unit when it seems more like a design pattern or even "container" for the actual units: the functions themselves. — , Jun 18 '17 at 20:18
Now THAT is important point, as It is completely different from how I (and I believe many others) see classes. So you should first make it clear what class is and what are it's design considerations. — Euphoric, Jun 18 '17 at 20:24
@Euphoric For the pointer example, its certainly possible to test it with the public interface alone, but it becomes complex and this complexity can obfuscate things to maintainers - thus wasting engineering time (not to mention false positives). Exclusively calling the functionality being tested would yield a much simpler, clearer, and easier to maintain test IMO. It may not always be easy, but I'm wondering why this isn't the "goto" solution? Of course, a mix of white box and black tests would provide the most comprehensive test suite. — , Jun 18 '17 at 20:25
@Euphoric Also, when testing the pointer example, it might not be clear whats actually being tested depending upon how the test is written, for example, you might just assert no exception was thrown and call it good - but that's not very "specific". Whats "specific" is asserting the data is in the state it should be. My 2 cents. — , Jun 18 '17 at 20:30
Don't forget, that no test can pinpoint the exact error location in the code anyway. So this is not a matter of black and white, but different shades of grey. Individual functions might seem like a natural unit test granularity, but that doesn't mean they always represent a sweet spot between error isolation and test/refactoring overhead. Also, if you test the value of each member variable after each function call, you are effectively specifying implementation details, which very quickly results in tests that just replicate your code instead of verifying that it fulfills your requirements. — MikeMB, Jun 19 '17 at 17:24
As a final thought: If you do want to specify certain implementation properties, then I'd use asserts for that. They give you even finer granularity than "one-unit-test-per-function" and allow easier refactoring because it is usually obvious which parts of the test code have to change to mirror the changes in your production code. — MikeMB, Jun 19 '17 at 17:29
The discussions here might be interesting for you: https://softwareengineering.stackexchange.com/questions/100959/how-do-you-unit-test-private-methods?noredirect=1&lq=1 — MikeMB, Jun 19 '17 at 21:15
It's still a unit test. You're expecting the unit that is being tested to be a method, but in this case, the unit is a class. This is not a problem as long as the class isn't huge. An integration test is when you combine several units (*whatever you've previously decided a unit is*). — user253751, Jun 20 '17 at 02:21
@immibis Exactly, it's a matter of perspective which I [explained here](https://softwareengineering.stackexchange.com/a/351207/275704). — , Jun 20 '17 at 18:11

Doc Brown · Accepted Answer · 2017-06-19T21:20:00.603

14

You are mixing up two related, but nevertheless different things:

white box testing
unit testing by using private methods

The reasons for writing or not writing unit test only using public methods have been discussed numerous times before on this site, for example here or here. I don't think it makes sense to repeat those arguments, if that is your question, you will probably find an answer following those links.

White box testing, however, does not mean to use private members for setting up a test. It means to design tests using specific knowledge about the internals of the tested class or component. For example, by creating tests to achieve full code coverage and/or branch coverage - and this is typically done by using just public members. So white box testing requires to know the internals of a class, but does not directly utilize access to the internals. This lets the designer of a component in a situation where he can still change the implementation details without worrying too much about the tests.

This kind of testing is not discouraged in OOP, quite the opposite. The well known Test Driven Development (which is popular in OOP as well as in non-OOP) is a form of testing which actually leads to these kind of tests: whenever one wants to add a new feature to a function, class of component, one writes a "red" test first, adds some new code or changes some existing code to add the feature, and since the new test now becomes "green", it is obvious the added or changed code must have been covered by the test.

To your example: if removeElement is a public method of a "list" module, and not a member of a class, I would still recommend the way of testing using only the public interface of that module, just as if it was a class. Your example of a broken addElement or containsElement is contrived (and your idea of "to avoid calling public methods other than the one being tested" is - no offence - misguided). In reality, one would design such a test by

creating a new list
assert the list does not contain element X
add an element X to the list
assert the list nows contain element X
remove the element X from the list
assert the list does not contain element X any more

which is all possible using public methods.

If addElement or containsElement were broken, the above test sequence makes sure the test wil reveal this (and does not give a false positive for removeElement).

Of course, there are cases of classes or complex components where using the public interface alone might not be the best approach to create a full test scenario, and where it can be helpful to loosen the encapsulation to some degree, for example, by adding "maintenance hatches" into the code. But I think such cases are exceptional cases, and good test and component design should try to avoid these situations. A simple component like a linked list should not require such measures.

edited Jun 19 '17 at 21:20

answered Jun 18 '17 at 18:13

Doc Brown

199,015
33
367
565

1

"one would design such a test by ... " Yes, but what Zippers is saying that such test would break if any of the public methods broke, not only if `remove` broke. Which is what seems to bothering him. – Euphoric Jun 18 '17 at 18:31
2

@Euphoric: to me it seems the OP is bothered more by the fact a test by using just `add`, `remove` and `contains` can result in a false positive, which can be mitigated by adding `assert` statements after each step. – Doc Brown Jun 18 '17 at 18:36
I think his idea is that in ideal scenario only one test would fail, making it clear WHAT broke, while in your scenario, you would have majority of tests failing and thus requiring more time, even with additional asserts. – Euphoric Jun 18 '17 at 18:39
Thank you for the responses! White box testing [per-wikipedia](https://en.wikipedia.org/wiki/White-box_testing) is "a method of testing software that tests internal structures or workings of an application, as opposed to its functionality." Code coverage is one interpretation of that, but code coverage only helps assert control flow - there is no data assertion. I'd prefer not to let this discussion get lost in semantics :) – Jun 18 '17 at 18:52
@Euphoric that is whats bothering me. If you wanted to test removing the middle element from a doubly linked list how would you assert the "previous" pointer on the next node was correct after the removal? Calling containsElement or numElements won't cut it since they only follow the "next" pointer. You'd have to call methods in a specific order to try to induce an exception to catch issues like this...and the complexity will only grow for more complex data structures. Asserting data directly would be less brittle and more clear to maintainers then playing games with the call order. – Jun 18 '17 at 18:54
6

@Zippers *"If you wanted to test removing the middle element from a doubly linked list how would you assert the "previous" pointer on the next node was correct after the removal?"* Unittesting does not verify that. Unittesting verifies that the iteration skips the removed element or that it is inaccessible through the public interface because this is the expected *behavior*, That pointers are involved which have to be adjusted is *state* or *implementation detail* which is *not vefified*. This way the implementation detail can change (improve) without braking the test. – Timothy Truckle Jun 18 '17 at 21:12
@Zippers: that Wikipedia article also says *"The tester chooses inputs to exercise paths through the code and determine the appropriate outputs."* - IOW, tests for code & branch coverage. – Doc Brown Jun 18 '17 at 21:12
@DocBrown TDD does have a different approach: it does not aim to *code* coverage, but to *requirement coverage*. – Timothy Truckle Jun 18 '17 at 21:14
@TimothyTruckle: IMHO it is debatable if that is a really different thing. My point is: TDD is in my POV first and foremost a whitebox implementation & testing technique, the tests are choosen with the exact knowledge which things are already internally implemented and which are missing. – Doc Brown Jun 18 '17 at 21:18
@DocBrown _The tester_ could be the unit test writer _chooses inputs_ can be preparing the input for the function _to exercise paths through the code_ can mean calling the function _and determine the appropriate outputs_ is asserting the output. The definition could also be interpreted as code coverage in some circles as well. I'd rather not devolve the conversation into semantics though. – Jun 18 '17 at 21:23
@Zippers: I still think your double linked list example contrived. A double link list needs to provide methods for navigating forward and backward through its elements. Using this tools I think it will be pretty simple to create a non-brittle test using just public methods without making any assumptions about the internal implementation details. – Doc Brown Jun 18 '17 at 21:27
@TimothyTruckle Thank you for the response! If the unit test doesn't verify the dangling pointer, then your missing a critical defect. At which phase during testing would you catch an issue like this? EDIT: You beat me to the response :) What kind of non-brittle test and tools do you mean? – Jun 18 '17 at 21:28
@TimothyTruckle I think I see what you mean, you mean the _tool_ being some other method which provides navigation. That method may or may not exist. The developer might have felt like implementing a doubly linked list internally for whats billed as a singly linked list - after all, that's the benefit of objects right - swapping out the implementation details without breaking the interface. This example is getting contrived. It seems the answer to my question is turning out to be that OO best practice dictates that tests should not access internals and thus living with unverified state. – Jun 18 '17 at 21:44
TDD using private methods does tend to destroy encapsulation, making it harder to do future development because internal dependencies are multiplied. In that sense, at least, TDD makes OOP harder. Some people do not discourage this type of testing; I do, having seen how it does not scale. – Frank Hileman Jun 18 '17 at 22:22
The problem with TDD is that what should be private tends to be made public. So we can say we are only testing public methods, but in reality, we made private methods public. – Frank Hileman Jun 18 '17 at 22:23
@FrankHileman: I think you have a point. But does TDD really make OOP harder, or does OOP make it harder to apply TDD? Or does TDD for OO code require more thinking about a good public interface, which is inherently harder? – Doc Brown Jun 19 '17 at 10:06
2

The quoted Wikipedia definition seems to be wrong. White-box testing doesn't test internals etc. but uses knowledge of the implementation to determine inputs that are more likely to cause trouble than others. Best or worst case depending on your point of view would be me reading the source code, finding a bug, and writing a test that I know exhibits the bug and fails - a test that would have been impossible to create with black-box testing. – gnasher729 Jun 19 '17 at 11:20
@gnasher729: yes, but you surely wanted to address your comment to the OP, not to me. – Doc Brown Jun 19 '17 at 12:01
2

@Zippers Well, it doesn't matter much if the internal pointer is set wrong if the actual traversal of the list provides the right order. Assume my double-linked list looks like A-B-C with an interface with add/remove/next/previous functions. If I can call remove(B) while at A, then call next() and be at C, then call previous() and be at A, then the implementation details don't matter. The thing works, and tests exist to know that behavior functions, not to verify that the code hasn't changed one bit (like if we change nextPtr and prevPtr to a tuple ptrs=(next,prev), your test would break). – Delioth Jun 19 '17 at 19:16
I'm accepting this answer because it appears to be the most popular opinion among OO developers and its likely the answer they would expect a fledgling OO developer to receive. – Jun 21 '17 at 00:34
@DocBrown Generally speaking, the best designs using encapsulation (a fundamental part of the definition of OO) minimize the number of public APIs. This extends to both classes and assemblies (libraries). I.e., we try to minimize both the internal API between classes in an assembly, and the public API exposed by the assembly to other components in a program. This approach is fundamentally at odds with the TDD design technique whereby components must be tested before being developed. My own experience is with large TDD projects, all of which had poor design from an encapsulation perspective. – Frank Hileman Jun 26 '17 at 17:20
@FrankHileman: sure, but if I got you right, this has nothing to do with "whitebox tests being discouraged by OOP". With or without TDD, I think whitebox tests and OOP mix up perfectly, opposed to tests using private methods, but that is a different thing. – Doc Brown Jun 26 '17 at 19:23
@DocBrown Yes, I had assumed that "whitebox" was a mistake, and the reference was to a lack of encapsulation. Whitebox simply means knowledge, whereas encapsulation means limited access. – Frank Hileman Jun 26 '17 at 23:35

score 0 · Answer 2 · 2017-06-19T22:34:52.197

0

I think whats going on here is a difference of perspective: Do you view classes as the smallest unit or functions? I'm going to deconstruct my question from the perspective of someone who views classes as the smallest unit:

[The] ideal unit test ... is about taking input and asserting output for a single unit of functionality. Having to call public methods A, B, C, D, E, and F just to test method G is a borderline integration test.

This makes no sense because the class is the smallest unit, therefore calling multiple methods isn't an issue because it tests the unit as a whole.

[It] can potentially create false positives (since the data itself is never validated).

Its not about data confidence its about the interface and if it appears to behave correctly.

Isolating the failure of the test during maintenance is more difficult.

If a test fails then it doesn't matter which method induced the failure because the unit as a whole is broken. Difficulty finding the root cause is accepted because the result fixes the unit as a whole.

If you wanted to test the removeElement method on a LinkedList class you'd need to call addElement, then removeElement, and lastly containsElement to assert the element was removed. This is brittle because if addElement or containsElement broke, then the test will fail even if the implementation of removeElement is correct.

Its not about a method failing, its about the class as a whole failing. Testing a specific method in isolation doesn't make sense. Its best to test the class as a whole through workflow or requirement tests.

Now I'll state the perspective of someone viewing functions as the smallest unit. I think I described it best in edit #2 (I've modified the wording slightly):

A class is a design pattern: "construct", "consume", and "destruct" which is no different than fopen, fread/fwrite/fseek, and fclose in C. Regardless of whether there is an implicit self parameter, definitions are placed in a namespace, or you call it private, public, or protected everything is data and data transformations at the end of the day.

From this perspective there is nothing wrong with establishing data inputs directly during testing. If you don't then you end up playing games with the function call sequence to nudge the data (which you weren't supposed to know about) into a state which you can indirectly verify; this sacrifices direct data confidence and test clarity.

There seems to be a divide regarding whether engineers should think in terms of data and data transformations or abstractions and at what level during development and testing. This raises the question as to why this divide exists. Its an important question because it has a ripple effect on how engineers view, design, and test software. I'm tempted to post a separate question because I believe its worth asking.

edited Jun 19 '17 at 22:34

answered Jun 19 '17 at 19:56

`fopen`, `fread` etc. are operations on a file, using an opaque `FILE` pointer. You surely do not want to write any tests utilizing the knowledge about the internals of `FILE`. The file abstraction in C is not IMHO not different from a file abstraction by a separate file class, in C the smallest sensible testable unit for a file is the data type `FILE` together with its operations. So I don't see how writing unit tests for an equivalent `File` class (like the `File` classes in Java or C#) should be quite different from writing unit tests for file operations in C. – Doc Brown Jun 20 '17 at 12:01
... and you will also need to use multiple of these methods in a test. If you want to test `fread`, you need to utilize `fopen`. For a meaningful test of `fseek`, you will probably utilize `fread` to verify you reached the intended file position. – Doc Brown Jun 20 '17 at 12:09
I appreciate your feedback @DocBrown but your getting hung up on semantics and examples and your not addressing the core ideas: Do you believe software is about data and data transformation? Do you believe classes are an implementation of the "life cycle" pattern? – Jun 20 '17 at 16:18
Don't get me wrong, but as long as your examples seem to proof the opposite of what you are describing else, as long as your usage of terms like "whitebox testing" looks to be based on a misunderstanding or misconception, it is hard for me to understand what your core question is about. Surely software is about "data and data transformation" from some point of view, on the right level of abstraction, and I have only a vague idea what you mean by "life cycle pattern". But it seems you are drifting away from the original question, and this is a Q&A site, not a discussion forum. – Doc Brown Jun 20 '17 at 18:44
@DocBrown The point I was making is that how engineers view software has a ripple effect on how they design, **test**, and recommend test strategies. I _suspect_ the emphasis on behavioral testing is driven by business needs, cost, engineering time, and quality assurance. To businesses, behavioral testing may be "good enough", but different industries and applications may need different levels of validation and correctness. – Jun 20 '17 at 19:12
@DocBrown A separate question I'm planning on asking is with regards to what kind of trials and studies have been done in the field of software engineering to objectively compare test strategies. I've been searching this site trying to find answers to test questions that reference studies or hard data, but most everything seems to be opinion driven. – Jun 20 '17 at 19:18
Beware, asking "how engineers view software" sounds to me like a philosophical question which makes it a candidate for being quickly closed either as "too opinionated" or "too broad". Asking for "trials and studies" will closed by the community, too, since asking for third party resources is explicitly off-topic for this site. Better stick to a narrower focus. – Doc Brown Jun 20 '17 at 21:56
The rule about opinions sounds strange to me since no one is providing citations resulting in "answers" that _appear_ to be opinion; the "like" and "dislike" mechanism _appears_ to be a popularity contest. This leads me to wonder why our field doesn't attempt to quantify methodology. Why not verify the total defect count is reduced by N% when using methodology A vs B? This very much seems like a question directed at the core of software engineering because without quantifying then everything appears to be opinion. I'll keep the rules in mind if I ask this question. Thanks again. – Jun 21 '17 at 00:09

Why is white box testing discouraged in OOP?

2 Answers2

Linked

Related