9

Consider the following approach to validation of an API class (POJO or what have you, I mean a class which just acts as a container for some properties), we make all constructors private to the API scope (and actually let us insist there only be one constructor), and the constructor does validation of its inputs, throwing exceptions if the inputs are invalid.

We then provide some static method either in the API package or on the class itself (and then make the constructor fully private). All this static method does is call the constructor for us and transform the exception case into something nicer. I'm mostly a scala developer so let's say we get a

Either[APIDefinedExceptionTrait, ClassWeWant]

Now the only way to make a ClassWeWant is through this static method.

If I contrast this with the typical approach I see (in Scala), of instantiating the class and then having an external validation method (I really dislike this approach), I see a couple of important to me benefits.

  1. It is not possible to instantiate a class with invalid state and avoid the validation.
  2. The validation, as it lives directly in the class, is essentially self documenting the constraints the properties of the class has. With the external method approach, theis is decoupled and it can be hard to know what constraints hold.
  3. It makes, to me, more logical sense. If my type system was powerful enough to encode the constraints in my type without it being tediously complex, I would do that and then the constraint would live in the definition of the class itself (which is what my proposed approach to validation does).

I have never seen this approach in code, so I suspect it's a bad idea. But I can't really understand why, I see two downsides and one suspcious thing.

Downside 1 - It becomes tedious to create the class when you know the inputs are already correct. But I mostly see this validation pattern used in relation to user input, e.g. serialising a post body into a class where the class is only really liable to be made once and then passed around (and admittedly maybe serialised/deserialized).

Downside 2 - It has the overhead of creating exceptions, handling them etc. But as above for my use case I feel this is not a problem as my classes will be small and infrequently made.

Suspcious thing - I am using exceptions for control flow, but I think I am justified as a work around of how constructors work. If I could tell the constructor to return an Either of course I of course would.

Are there some more obvious problems with this?

JustSomeGuy
  • 181
  • 1
  • 7
  • 1
    Does this answer your question? [Legitimate "real work" in a constructor?](https://softwareengineering.stackexchange.com/questions/305464/legitimate-real-work-in-a-constructor) – gnat Sep 03 '21 at 19:59
  • @gnat No, that question is I feel too general. I and Scala developers in general I think are sold on the idea of doing some 'real work' in the constructor. – JustSomeGuy Sep 03 '21 at 20:25
  • How do you define "real work?" – Robert Harvey Sep 03 '21 at 20:25
  • 1
    Well, I think it's fair to say that any work required to stand up a valid object of the constructor's type is legitimate work for a constructor, no matter how elaborate it is. When you build a car, you don't distinguish between real work and unreal work; it is all required. But it's no longer the constructor's responsibility once the customer drives it off the lot. – Robert Harvey Sep 03 '21 at 20:34
  • @RobertHarvey I don't know really I've never understood the expression myself I was probably hasty in saying it people were sold on it. But I do know that I've seen arbitrarily complex stuff happen which may be effectful, e.g. initialising some timer, trying to contact some service or something that might throw. I imagine these things constitute real work. – JustSomeGuy Sep 03 '21 at 20:47

3 Answers3

15

You should do validation in a constructor, to the extent that it is necessary to ensure that your constructor can stand up a viable object.

This is especially true of any dependencies or data you hand to your constructor. If the constructor can't work out a way to produce a viable object from the arguments you present to it, it should throw an exception.

The overhead of a thrown exception in this context is inconsequential; you don't have a viable object that could exhibit desirable performance characteristics, so the notion that an exception thrown here might hurt your performance is meaningless, unless you're creating many objects in a tight loop or something like that.

Nor is this a flow control problem; by definition, if you are throwing an exception, you are doing so because something happened that cannot be (easily) recovered from.

Moving constructor validation to an external method disembodies the validation from the constructor that's using it. Nothing good can come from that; the whole point of using a class is to encapsulate its data and logic, not to farm it out.


All that said, let's dig into your specific case.

  1. Most of what I said applies to ordinary classes. POJO's don't generally contain constructors (with parameters). The whole point of making a POJO is to merely have a lightweight container for data.

  2. Input validation can take place anywhere, and often takes place in multiple areas within a software system. Each instance of input validation can decide what, if any, constraints are placed on the data handed to it via a POJO.

  3. Such validation doesn't necessarily require the signaling of exceptions to indicate a problem, so flow control and performance issues are moot. In a constructor, you have few choices. The only real choice you have if you cannot construct a valid object is to throw.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • I am happy to see you essentially confirm what I thought, and it is good to have my 'downsides' dismissed. This external validation approach I described really is quite common in Scala though. I feel that in Scala people are a bit fervent in their avoiding of Exceptions at all costs, which is why I propose this extra nonsense with the static method to get an Either. – JustSomeGuy Sep 03 '21 at 21:04
  • 1
    @JustSomeGuy: The benefit of monads like [`Either`](https://www.freecodecamp.org/news/a-survival-guide-to-the-either-monad-in-scala-7293a680006/) and [`Maybe`](https://en.wikibooks.org/wiki/Haskell/Understanding_monads/Maybe) is that you can think more "functionally." [Thinking functionally](https://en.wikipedia.org/wiki/Functional_programming) has benefits; you can take advantage of method chaining, continuation-passing style, and the like. Functional programming style is easier to reason about from a "correctness" perspective. `Either` and `Maybe` work better in higher-order functions. – Robert Harvey Sep 03 '21 at 21:31
  • One point to add to the list is that in this case it is not realistically possible to report multiple problems in the input. The exception would be thrown on the first problem and the rest of the validation would not be performed unless you explicitly account for the situation in your validation logic. – Bart van Ingen Schenau Sep 05 '21 at 08:30
  • @Bart: Oh, I don't think that's true. You can check as many things as you want, and report them all in the exception if you wish. But generally speaking, I think it's more useful to throw as soon as you know you can't construct. – Robert Harvey Sep 05 '21 at 12:59
2

I wouldn't say that it is a bad idea, but in the case of validating serialized data there is one thing you have to really pay attention to. Usually when you have exceptions you throw them as soon as you have a problem. But if the data classes have a lot of properties and several of them are incorrect at the same time what are you going to do? Fix the first one, re-run everything and get a second exception, fix it and re-run again fixing all the properties one by one? It could get a bit annoying. When you throw a validation exception you better list immediately all the validation errors.

In normal cases you might have other minor issues:

  • some properties might need the same validation rules in the constructor and in the set method. But if one property depends on another calling the set method from the constructor is not a good idea, what are you going to do? Put the validation rules in static methods that take both properties as arguments or duplicate the code?
  • what if, in some situations, you don't have all the properties needed to create an object at the same time? Will you store some properties in temporary objects? Will you instantiate a builder and call build() when you have all of them? In this case you might get an error message some time after the error actually happened. If a user if filling different forms and they get an error message they look for the missing field only in the current page.
FluidCode
  • 709
  • 3
  • 10
1

The prohibition on “real work” is meant to stop you from doing anything but setting a valid state. Anything else should either be done before or after.

Validation can be skipped so long as the type system only allows valid states. Otherwise validate state when setting state.

Keeping constructors this simple prevents complications when some new use of the valid state is added. Just add a new method. It also makes conversions into this state some other objects problem. Now sure, there are other ways to do this. The beautiful thing about forbidding constructors from doing real work is now there aren’t other ways to do it. Your path becomes clear.

And it’s really nice when constructing an object isn’t really a significant performance consideration. Lets you save the work for later when you’re sure what you need it to be.

candied_orange
  • 102,279
  • 24
  • 197
  • 315