21

Suppose we have a method foo(String bar) that only operates on strings that meet certain criteria; for example, it must be lowercase, must not be empty or have only whitespace, and must match the pattern [a-z0-9-_./@]+. The documentation for the method states these criteria.

Should the method reject any and all deviations from this criteria, or should the method be more forgiving about some criteria? For example, if the initial method is

public void foo(String bar) {
    if (bar == null) {
        throw new IllegalArgumentException("bar must not be null");
    }
    if (!bar.matches(BAR_PATTERN_STRING)) {
        throw new IllegalArgumentException("bar must match pattern: " + BAR_PATTERN_STRING);
    }
    this.bar = bar;
}

And the second forgiving method is

public void foo(String bar) {
    if (bar == null) {
        throw new IllegalArgumentException("bar must not be null");
    }
    if (!bar.matches(BAR_PATTERN_STRING)) {
        bar = bar.toLowerCase().trim().replaceAll(" ", "_");
        if (!bar.matches(BAR_PATTERN_STRING) {
            throw new IllegalArgumentException("bar must match pattern: " + BAR_PATTERN_STRING);
        }
    }
    this.bar = bar;
}

Should the documentation be changed to state that it will be transformed and set to the transformed value if possible, or should the method be kept as simple as possible and reject any and all deviations? In this case, bar could be set by the user of an application.

The primary use-case for this would be users accessing objects from a repository by a specific string identifier. Each object in the repository should have a unique string to identify it. These repositories could store the objects in various ways (sql server, json, xml, binary, etc) and so I tried to identify the lowest common denominator that would match most naming conventions.

Zymus
  • 2,403
  • 3
  • 19
  • 35
  • 1
    This probably depends heavily on your use case. Either one could be reasonable, and I've even seen classes that provide both methods and make the user decide. Could you elaborate on what this method/class/field is supposed to do, so we could offer some real advice? – Ixrec Aug 02 '15 at 23:00
  • 1
    Do you know everyone who calls the method? As in, if you change it, can you reliably identify all the clients? If so, I'd go with as permissive and forgiving as performance concerns allow. I might also delete the documentation. If not, and it's part of a library API, I'd make sure the code implemented exactly the advertised API, as otherwise changing the code to match the documentation in future is apt to generate bug reports. – Jon Chesterfield Aug 02 '15 at 23:01
  • I've edited the OP for the primary use-case. – Zymus Aug 02 '15 at 23:04
  • Potentially not programming issue. Ideally, they should match. The question when they're not, which one is _the_ correct one? Once that's been decided, the next question is should it be fixed? I'd check with the manager. – imel96 Aug 03 '15 at 01:31
  • 7
    You could argue that Separation of Concerns says that if necessary, you should have a strict `foo` function that is rigorous in what arguments it accepts, and have a second helper function that can try to "clean" an argument to be used with `foo`. This way, each method has less to do on its own, and they can managed and integrated more cleanly. If going down that route, it would probably also be helpful to move away from an exception-heavy design; you can use something like `Optional` instead, and then have the functions that consume `foo` throw exceptions if necessary. – gntskn Aug 03 '15 at 03:03
  • 1
    This is like asking "someone wronged me, should I forgive them?" Obviously there are circumstances where either one *or* the other is appropriate. Programming may not be as complicated as human relationships, but it is definitely complex enough that a blanket prescription like this isn't going to work. – Kilian Foth Aug 03 '15 at 06:12
  • @giantskin Indeed. I would add to that that the method that does the "cleaning" return a typed thing, so a suitable string goes through a sort of coercion process from string -> actual_type, and this is the type that the function/method should receive. In Java this can be done by pretending that classes are types (which isn't quite true, but catches this case), in some languages there are more explicit techniques (actual types, always passing a closure, whatever), in some others you just fbtsoyp. The point is to let the function be *free* to do exactly what it says it does. – zxq9 Aug 03 '15 at 11:01
  • Postel's Law (the Robustness principle) states: "Be conservative in what you do, be liberal in what you accept from others." So accept non-conformant input as long as the meaning is clear. I believe your forgiving implementation does this. – Boggin Aug 03 '15 at 11:40
  • 2
    @Boggin I would also point you to [The Robustness Principle Reconsidered](http://queue.acm.org/detail.cfm?id=1999945). The difficulty comes when you need to expand the implementation and the forgiving implementation leads to an ambiguous case with the expanded implementation. –  Aug 03 '15 at 16:11
  • You've type hinted for a String object without a default. Why would you be able to pass in a null? Is this a weakness specific to your language? – James Aug 03 '15 at 17:25
  • @Jimbo This is in Java. – Zymus Aug 03 '15 at 17:33
  • @Zymus Ah, I would've expected you to say `foo(String bar = null) { }`, my JavaFoo isn't that good :-) – James Aug 03 '15 at 18:35
  • As soon as you implemented the second version of the method, you invalidated your documentation for the method. It now *does* support strings with spaces. – Rob Aug 03 '15 at 23:47

6 Answers6

47

Your method should do what it says it does.

This prevents bugs, both from use and from maintainers changing behavior later. It saves time because maintainers don't need to spend as much time figuring out what is going on.

That said, if the defined logic isn't user friendly, it should perhaps be improved.

Telastyn
  • 108,850
  • 29
  • 239
  • 365
  • 8
    This is the key. If your method does exactly what it says it does, the coder using your method will compensate for their specific use case. Don't ever do something undocumented with the method just because you think it is useful. If you need to change it, write a container or change the documentation. – Nelson Aug 03 '15 at 02:05
  • I'd add to @Nelson's comment that the method shouldn't be designed in a vacuum. If the coders say they'll use it but will compensate and their compensations have general-purpose value, consider making them part of the class. (E.g., have `foo` and `fooForUncleanString` methods where the latter makes the corrections before passing it to the former.) – Blrfl Aug 03 '15 at 17:47
20

There are a few points:

  1. Your implementation must do what the documented contract states, and should do nothing more.
  2. Simplicity is important, for both contract and implementation, though more for the former.
  3. Trying to correct for erroneous input adds complexity, counter-intuitively not only to contract and implementation but also to use.
  4. Errors should only be caught early if that improves debugability and doesn't compromise efficiency too much.
    Remember that there are debug-assertions for diagnosing logic errors in debug-mode, which mostly alleviates any performance-concerns.
  5. Efficiency, as far as available time and money allow without compromising simplicity too much, is always a goal.

If you implement a user-interface, friendly error-messages (including suggestions and other help) are part of good design.
But remember that APIs are for programmers, not end-users.


A real-life experiment in being fuzzy and permissive with input is HTML.
Which resulted in everyone doing it slightly differently, and the spec, now it is documented, being a mammoth tome full of special cases.
See Postel's law ("Be conservative in what you do, be liberal in what you accept from others.") and a critic touching on that (Or a far better one MichaelT made me aware of).

Deduplicator
  • 8,591
  • 5
  • 31
  • 50
  • Another critical piece by the author of sendmail: [The Robustness Principle Reconsidered](http://queue.acm.org/detail.cfm?id=1999945) –  Aug 03 '15 at 16:12
15

A method's behavior should be clear cut, intuitive, predictable, and simple. In general, we should be very hesitant to do extra processing on a caller's input. Such guesses about what the caller intended invariably have a lot of edge cases that produce undesired behavior. Consider an operation as simple as file path joining. Many (or perhaps even most) file path joining functions will silently discard any preceding paths if one of the paths being joined appears to be rooted! For example, /abc/xyz joined with /evil will result in just /evil. This is almost never what I intend when I join file paths, but because there's no interface that doesn't behave this way, I'm forced to either have bugs or write extra code covering these cases.

That said, there are rare occasions when it makes sense for a method to be "forgiving," but it should always be within the power of the caller to decide when and whether these processing steps apply to their situation. So when you've identified a common preprocessing step that you want to apply to arguments in a variety of situations, you should expose interfaces for:

  • The raw functionality, without any preprocessing.
  • The preprocessing step by itself.
  • The combination of the raw functionality and the preprocessing.

The last is optional; you should only provide it if a large number of calls will use it.

Exposing the raw functionality gives the caller the ability to use it without the preprocessing step when they need to. Exposing the preprocessor step by itself allows the caller to use it for situations where they're not even calling the function or when they want to preprocess some input before calling the function (like when they want to pass it into another function first). Providing the combination allows callers to invoke both without any hassle, which is useful primarily if most callers will use it this way.

jpmc26
  • 5,389
  • 4
  • 25
  • 37
  • 2
    +1 for predictable. And another +1 (I wish) for simple. I would much rather you help me spot and correct my mistakes than try to hide them. – John M Gant Aug 03 '15 at 13:50
4

As others have said, making the string matching "forgiving" means introducing additional complexity. That means more work in implementing the matching. You now have many more test cases, for example. You have to do additional work to ensure there are no semantically-equal names in the namespace. More complexity also means there is more to go wrong in the future. A simpler mechanism, such as a bicycle, requires less maintenance than a more complex one, such as a car.

So is the lenient string matching worth all that extra cost? It does depend on the use case, as others have noted. If the strings are some kind of external input you have no control over, and there is a definite advantage to lenient matching, it might be worth it. Perhaps the input is coming from end users who may not be very conscientious about space characters and capitalization, and you have a strong incentive to make your product easier to use.

On the other hand, if the input were coming from, say, properties files assembled by technical folks, who should understand that "Fred Mertz" != "FredMertz", I'd be more inclined to make the matching stricter and save the development cost.

I do think in any case there is value in trimming and disregarding leading and trailing spaces, though -- I've seen too many hours wasted on debugging those kinds of issues.

mat_noshi
  • 181
  • 3
3

You mention some of the context from which this question comes.

Given that, I would have the method do just one thing, it asserts the requirements on the string, let it execute based on that - I wouldn't try to transform it here. Keep it simple and keep it clear; document it, and endeavour to keep the documentation and code in sync with each other.

If you wish to transform the data that comes from the user database in some more forgiving manner, put that functionality into a separate transformation method and document the functionality tied it.

At some point the function's requirements need to be metered out, clearly documented and the execution must continue. The "forgiveness", at his point, is a little mute, it's a design decision and I would argue for the function not mutating its argument. Having the function mutate the input hides some of the validation that would be required of the client. Having a function that does the mutation helps the client get it right.

The big emphasis here is clarity and to document what the code does.

Niall
  • 1,831
  • 1
  • 18
  • 20
-1
  1. You can name a method according to the action like doSomething() , takeBackUp () .
  2. To make it easy to maintain you can keep the common contracts and validation on different procedures . Call them as per use cases.
  3. Defensive programming : your procedure handles wide range of input including (Minimum things those are use cases must be covered anyway)