30

Here's a skeleton of a class I built that loops through and deduplicates data - it's in C# but the principles of the question aren't language specific.

public static void DedupeFile(FileContents fc)
{
    BuildNameKeys(fc);
    SetExactDuplicates(fc);
    FuzzyMatching(fc);
}

// algorithm to calculate fuzzy similarity between surname strings
public static bool SurnameMatch(string surname1, string surname2)

// algorithm to calculate fuzzy similarity between forename strings
public static bool ForenameMatch(string forename1, string forename2)

// algorithm to calculate fuzzy similarity between title strings
public static bool TitleMatch(string title1, string title2)

// used by above fn to recognise that "Mr" isn't the same as "Ms" etc
public static bool MrAndMrs(string title1, string title2)

// gives each row a unique key based on name
public static void BuildNameKeys(FileContents fc)

// loops round data to find exact duplicates 
public static void SetExactDuplicates(FileContents fc)

// threads looping round file to find fuzzy duplicates
public static void FuzzyMatching(FileContents fc, int maxParallels = 32)

Now, in actual usage only the first function actually needs to be public. All the rest are only used inside this class and nowhere else.

Strictly that means they should of course be private. However, I've left them public for ease of unit testing. Some people will no doubt tell me I should be testing them via the public interface but that's partly why I picked this class: it's an excellent example of where that approach gets awkward. The fuzzy matching functions are great candidates for unit tests, and yet a test on that single "public" function would be near-useless.

This class won't ever get used outside a small team at this office, and I don't believe that the structural understanding imparted by making the other methods private is worth the extra faff of packing my tests with code to access private methods directly.

Is this "all public" approach reasonable for classes in internal software? Or is there a better approach?

I am aware there is already a question on How do you unit test private methods?, but this question is about whether there are scenarios where it's worthwhile bypassing those techniques in favour of simply leaving methods public.

EDIT: For those interested, I added the full code on CodeReviewSE as restructuring this class seemed too good a learning opportunity to miss.

Bob Tway
  • 3,606
  • 3
  • 21
  • 26
  • 11
    Have you considered making the methods Internal? And add an InternalsVisibleTo attribute, that points to your Unit Test project. – Falgantil Jun 06 '16 at 09:18
  • 1
    @BjarkeSøgaard No! I'd never come across that attribute before but it sounds like it would solve my specific problem really well. So thanks very much for that. Going to leave the question here as it still seems to have value for other languages. – Bob Tway Jun 06 '16 at 09:22
  • How would it be "near-useless"? If you can achieve all functionality via that method, it should be perfectly valid. – Chris Wohlert Jun 06 '16 at 09:25
  • @ChrisWohlert Yes, the class could be tested that way. However, it would mean mocking up a whole "FileContents" DTO as opposed to just a couple of strings. And it wouldn't be much of a "unit" test any more, because the testing would no longer be targetted at small blocks of code. This is my question - is it worth the trade off? – Bob Tway Jun 06 '16 at 09:29
  • 5
    @MattThrower, Hi Matt, sorry, but I edited my comment after reading your question a bit more carefully. When thinking of a unit test, a "unit" != small amount of code, it means "one unit of work". A test should test a feature, not a method. I.e `TestDedupeFile_WhenCalledWithFuzzyTitles_MatchesThem`. You would have as many tests for the public method, as you would if you wrote one for each private method. – Chris Wohlert Jun 06 '16 at 09:35
  • 1
    As to; Is it worth the tradeoff? I don't know about the mocking part, how much work it implies. However, when a developer sees this code, he should be able to make changes to the privates of the class as he see fit, whether that be splitting one method into two, or two into one. – Chris Wohlert Jun 06 '16 at 09:37
  • 1
    [Related](http://programmers.stackexchange.com/a/319893/73508) – Robbie Dee Jun 06 '16 at 14:37
  • Same question on Stackoverflow: http://stackoverflow.com/questions/7075938/ – BlueRaja - Danny Pflughoeft Jun 06 '16 at 16:15
  • A class that has one public function and many private functions is degenerate. Refactor so you have a class with a normal interface, which can then be aggregated privately by something else. – isanae Jun 06 '16 at 18:17
  • In Swift 2, you can add an attribute @testable to a method or variable, which means the compiler allows you to access it _from unit tests_ even if it is private, making the problem go away. – gnasher729 Jun 06 '16 at 22:57
  • 1
    [Questions like this have been asked dozens of times](http://programmers.stackexchange.com/search?q=test+private+method+is%3Aquestion). While the topic is interesting and useful, it is also old and tired. –  Jun 07 '16 at 17:52
  • If you feel the need to test private methods, it means your class is doing too much. Split it into mutliple classes, and inject the new classes into the parent using dependency injection. You'll be able to test public methods of the split classes. – Eternal21 Jun 07 '16 at 19:54

8 Answers8

64

I've left them public for ease of unit testing

Ease of writing those tests, maybe. But you are then tightly coupling that class to a bunch of tests that interact with its inner workings. This results in brittle tests: they will likely break as soon as you make changes to the code. This creates a real maintenance headache, that often results in people simply deleting the tests as they become more trouble than they are worth.

Remember, unit testing doesn't mean "testing the smallest possible piece of code". It's a test of a functional unit, ie for a set of inputs into part of the system, I expect these results. That unit might be a static method, a class, or a bunch of classes within an assembly. By targeting public APIs only, you mimic the behaviour of the system and so your tests become both less coupled and more robust.

So make the methods private, mock the "whole 'FileContents' DTO" and only test the one true public method. It'll be more work initially, but over time, you'll reap the benefits of creating useful tests like this.

David Arno
  • 38,972
  • 9
  • 88
  • 121
  • In my experience you have a choice between too many things public, writing only integration instead of unit tests or testing internals. (Though the last option is usually limited to public methods on internal classes) – CodesInChaos Jun 06 '16 at 12:55
  • 11
    @CodesInChaos, whether a test is a unit or integration test is a matter of semantics in my experience. If tests can be run in parallel, without side effects, then they are unit tests, even if they result in the entire application running. Testing internals still leads to brittle tests: internals should be free to change in any way, which cannot happen if tests get their sticky paws on them. I have just been through an exercise of removing some sixty internal classes in a library I created. Because my tests only hit public methods, all 600+ of them still work properly. – David Arno Jun 06 '16 at 13:03
  • 5
    @DavidArno _"If tests can be run in parallel, without side effects, then they are unit tests"_ - great definition! – Lovis Jun 06 '16 at 14:46
  • 2
    @DavidArno Great first sentence. Taking the extra hour to do things the right way will pay off not only with the maintainability benefits you describe, but also make it so doing the right thing only takes 45 extra minutes next time and eventually becomes second nature. – Josh Rumbut Jun 06 '16 at 15:11
  • Thanks for all the contributions. I'm accepting this as the answer as it contains the clearest explanation of why it's a better idea to test only public methods. – Bob Tway Jun 06 '16 at 15:57
  • Use **`@Jailbreak`** from the [Manifold](http://manifold.systems/) framework so your tests can *directly* and *type-safely* access private members without compromising design or overexposing members for the sake of test cases. – Scott Mar 17 '19 at 00:50
37

I would normally expect you to exercise the private member functions via the public interface. In this instance I would write different tests to feed different file contexts in, with different data sets present in order to exercise those methods.

I don't think your tests should know about those private methods. I think they're part of the implementation, and your test should be concerned with how your component functions, not how it's been implemented.

This class won't ever get used outside a small team at this office

You might be surprised what gets reused as soon as you make it public. An alternative is to accept that making it public will result in re-use, and then pull the methods out into a general utility class. That way you can test those methods separately (since you've publicised them) and since it's a public utility class, you're not making any implicit assumptions that they'll only be used in this scenario you're currently focused on.

Brian Agnew
  • 4,676
  • 24
  • 19
  • 4
    That last paragraph is particularly valuable. If your implementation details are interesting enough and useful enough it is worthwhile to consider pulling them out to a separate location where they are public and can be tested separately. – YoungJohn Jun 06 '16 at 13:29
  • Basically what your last paragraph is saying is not just "the public parts of your interface need to be unit tested", but in fact also the reverse: "the parts of your interface that are unit tested need / are interesting enough to be public". I never thought of it this way, but my first hunch is to agree with this and view it as a strong guideline. – CompuChip Jun 06 '16 at 14:48
  • 1
    "No need to wear a seatbelt. Today's itinerary doesn't mention any drunks running red li..." – Ed Plunkett Jun 06 '16 at 15:50
9

I think the issue comes from the design. My gut feeling says you either wrote the tests after the code, or that you already had a complete implementation in mind when you started writing the tests, and then you just shoehorned the tests to fit the design.

I think this type of trouble can be avoided by remembering the short TDD cycle of making a small assertion and then making it pass in the simplest way possible.

You speak of it being hard to exercise all the private methods via public methods. This is probably indicative of your class doing too much. If you can't construct it by adding test after test and just seeing the requirements being met and then refactoring it into maintainable and readable code, then you either don't have enough knowledge about the entity to implement it, or it's too complex (yeah yeah, it's a generalization, I know they're evil).

By looking at your method names it looks to me like this class has way too many responsibilities. It has several algorithm implementations, it manages threads, it even possibly reads files from disk. Split up your work into manageable reusable chunks. Matching/validation algorithms can easily be injected (as strategies, more more preferably imo, as delegates), thread management should probably occur at a higher level, etc.

When you no longer have a large complicated class with billions of responsibilities (well, more or less), the testing becomes almost trivial.

sara
  • 2,549
  • 15
  • 23
  • Thanks for the input. You're right that I wrote tests after the class - that's the way I normally work. TDD has value but I have issues with it (way outside the scope of the question). The class does less than you think: it's only 125 lines of code. It doesn't read files, and the actual distance algorithm is a separate class. – Bob Tway Jun 06 '16 at 10:23
  • 1
    Fair enough, I won't argue that point here since it's out of scope as you say. I hope the answer is useful to others though, it's a rule of thumb I like to use. – sara Jun 06 '16 at 12:39
  • 1
    it's still useful to me, and I appreciate the input :) got a +1. – Bob Tway Jun 06 '16 at 12:51
  • @MattThrower - To a certain extent you just contradicted yourself. Since the class is only 125 lines and does not do as much as we might think it does, then there should be no reason why it couldn't be thoroughly unit tested via a single public interface. – Cerad Jun 06 '16 at 13:23
  • @Cerad Yes, I was overzealous in saying it *can't* be tested by its public interface. My point was more that it would involve considerable effort and messy tests compared to leaving methods public - there's some discussion of this in the comments. But thanks for the useful point. – Bob Tway Jun 06 '16 at 13:29
  • 2
    +1 *This is probably indicative of your class doing too much* - this to me is what came jumping off the screen - an obvious violation of the [SRP](https://en.wikipedia.org/wiki/Single_responsibility_principle)... – Robbie Dee Jun 06 '16 at 14:47
  • @MattThrower have you considered that it's not the length what does matter, but complexity? For instance, if we take the cyclomatic complexity (https://en.wikipedia.org/wiki/cyclomatic_complexity) in account. So, different expressions with the same code length can affect that differently, depending on do they create additional executional flows. So, if you have a lot of variants to check - maybe you method, regardless of it's length, is too complex? – Ivan Kolmychek Jun 06 '16 at 20:32
  • @MattThrower you may find this section even more on-topic: https://en.wikipedia.org/wiki/Cyclomatic_complexity#Implications_for_software_testing – Ivan Kolmychek Jun 06 '16 at 20:38
6

I agree with @BrianAgnew and @kai, but would like to add more than a comment.

While an IDedupeFiler (or whatever) should be tested through its public interface, the OP has decided there is value in testing the individual sub-routines. Irrespective of file size or line count (which is only a rough proxy count for class responsibilites), the OP has decided that there is too much complexity to test from the top of this class.

This is a good thing incidentally, one of the reasons people like TDD is that the need to test forces the coder to adapt (and improve) their design. It’s valid to point out that the earlier tests are written the earlier this process happens, but the OP isn’t at all far down the road of untestable design decisions and refactoring will not be onerous.

The question seems to be whether the OP should (1) make the methods public and achieve testability with less refactoring, or whether they should do something else. I’d agree with @kai and say the OP should (2) split this class up into isolated, separately testable chunks.

(1) breaks encapsulation and makes it less immediately obvious what the use and public interface of the class actually is. I think the OP acknowledges that this isn’t the best OOP design choice in their question. (2) means more classes, but I don’t think that’s much of a problem, and it provides testability without design compromises.

If you really disagree with me that your sub-methods represent separate and separately testable concerns, then don't try to dig into the class to test them. Exercise them through your top public method. How hard you find this will be a good indicator of whether this was the right choice.

Nathan Cooper
  • 1,319
  • 9
  • 15
  • 3
    Good points. I like the last sentence. People sometimes say "I chose an untestable design because it was better", but I don't buy that. An untestable design is inherently bad imo. All code needs to be tested. Doesn't matter if it's an automatic unit test runner, some big full-system tests or manual exploration tests or whatever. Teh code MUST be tested, so designing for testability is high prio. I'd rather have 10 well tested small classes than 3 large complex ones that cause brittle unmaintainable tests (or worse yet: force me to fire up IIS, compile all aspx files and launch my browser) – sara Jun 06 '16 at 12:44
3

I think you might benefit from a slight shift in the way you view unit tests. Instead of thinking about them as a way to guarantee that all your code is working, think of them as a way to guarantee that your public interface does what you claim it does.

In other words, don't worry about testing the internals at all - write unit tests that prove that when you give your class X inputs, you get Y outputs - that's it. How it manages to do that doesn't matter at all. In fact, the private methods could be totally wrong, useless, redundant, etc., and as long as the public interface does what it is supposed to do, from the point of view of the unit test, it is working.

This is important because you want to be able to go back in and refactor that code later when a new library comes out that does it better, or you realize you were doing unnecessary work, or you just decide that the way things are named and organized could be made more clear. If your tests are only around the public interface, then you are free to do that without worrying about breaking things.

If you make this shift in the way you think about unit tests and find that it is still hard to write the test for a given class, then it is a good sign that your design needs some improvements. Maybe the class should try to do less - maybe some of those private methods really belong in a new smaller class. Or maybe you need to use dependency injection to manage external dependencies.

In this case, without knowing more details about how you are doing this deduplication, I'd guess that the methods you want to make public might actually be better off as separate classes, or as public methods in a utility library instead. That way, you could define the interface for each one along with its range of allowed inputs and expected outputs, and test them in isolation from the deduplication script itself.

thomij
  • 169
  • 3
3

In addition to the other answers, there's another benefit to making these functions private and testing them via the public interface.

If you gather code coverage metrics on your code, it becomes easier to tell when a function is no longer being used. But, if you make all of these functions public, and make unit tests for them, then they will always have at least the unit test calling them even when nothing else does.

Consider this example:

public void foo() {
    bar();
    baz();
}

public void bar() { ... }

public void baz() { ... }

And then you make individual unit tests for foo, bar, and baz. So they will all be called by your unit test framework, and they will all have code coverage saying they're being used.

Now consider this change:

public void foo() {
    bar();
}

public void bar() { ... }

public void baz() { ... }

Since your unit tests still call all of these functions, your code coverage is going to say they're all being used. But baz is no longer being called by any other part of your software other than the unit tests. It's effectively dead code.

Had you written it like this:

public void foo() {
    bar();
}

private void bar() { ... }

private void baz() { ... }

Then you would lose 100% of your code coverage of baz when foo changed, and that's your signal to remove it from the code. OR, this raises a red flag because baz is actually a super important operation, and its omission causes other problems (hopefully if this is the case, then other unit tests would catch it, but perhaps not).

Shaz
  • 2,614
  • 1
  • 12
  • 14
0

I would seperate the function and the test cases to seperate classes. The one contains the function the other contains the test cases which calls the function and asserts if the result is equals to what you expect. I'm not in C but in JAVA I would use Junit for this. This divides the logic and makes your class more readable. You also don't have to worry about your problem.

bish
  • 163
  • 1
  • 6
-1

You do not need to declare the methods public just to Unit-test them.

Unit test normally should belong to the same package where the class under testing belongs (of course, inside the different source code folder). As a result, also package private and protected methods can be accessed for testing. See Maven layout, for instance.

While it may not be worth writing very detailed tests for all internals, the visibility scope is not the only indicator to identify the unit.

h22
  • 905
  • 1
  • 5
  • 15
  • @Downvoter, you can write Java program where main() is the only public method, rest can be easily made at least package private even for a relatively complex application spanning over ten classes or about. Is it so that such application does not require any Unit testing? – h22 Jun 08 '16 at 06:25