How exactly should a CQRS Command be validated and transformed to a domain object?

Question

I have been adapting poor-man's CQRS¹ for quite some time now because I love its flexibility to have granular data in one data store, providing great possibilities for analysis and thus increasing business value and when needed another for reads containing denormalized data for increased performance.

But unfortunately pretty much from the beginning I have been struggling with the problem where exactly I should place business logic in this type of architecture.

From what I understand, a command is a mean to communicate intent and does not have ties to a domain by itself. They are basically data (dumb - if you wish) transfer objects. This is to make commands easily transferable between different technologies. Same applies to events as responses to successfully completed events.

In a typical DDD application the business logic resides within entities, value objects, aggregate roots, they are rich in both data as well as behavior. But a command is not a domain object thus it should not be limited to domain representations of data, because that puts too much strain on them.

So the real question is: Where exactly is the logic?

I have found out I tend to face this struggle most often when trying to construct a quite complicated aggregate which sets some rules about combinations of its values. Also, when modeling domain objects I like to follow the fail-fast paradigm, knowing when an object reaches a method it's in a valid state.

Let's say an aggregate Car uses two components:

Transmission,
Engine.

Both Transmission and Engine value objects are represented as super types and have according sub types, Automatic and Manual transmissions, or Petrol and Electric engines respectively.

In this domain, living on its own a successfully created Transmission, be it Automatic or Manual, or either type of an Engine is completely fine. But the Car aggregate introduces a few new rules, applicable only when Transmission and Engine objects are used in the same context. Namely:

When a car uses Electric engine the only allowed transmission type is Automatic.
When a car uses Petrol engine it may have either type of Transmission.

I could catch this component combination violation at the level of creating a command, but as I have stated before, from what I understand that should not be done because the command would then contain business logic which should be limited to the domain layer.

One of the options is to move this business logic validation to command validator itself, but this does not seem to be right either. It feels like I would be deconstructing the command, checking its properties retrieved using getters and comparing them within the validator and inspecting results. That screams like a violation of the law of Demeter to me.

Discarding the mentioned validation option because it does not seem viable, it seems like one should use the command and construct the aggregate from it. But where should this logic exist? Should it be within the command handler responsible for handling a concrete command? Or should it perhaps be within the command validator (I don't like this approach either)?

I am currently using a command and create an aggregate from it within the responsible command handler. But when I do this, should I have a command validator it would not contain anything at all, because should the CreateCar command exist it would then contain components which I know are valid on separate cases but the aggregate might say different.

Let's imagine a different scenario mixing different validation processes - creating a new user using a CreateUser command.

The command contains an Id of a users which will have been created and their Email.

The system states the following rules for user's email address:

must be unique,
must not be empty,
must have at most 100 characters (max length of a db column).

In this case, even though having a unique email is a business rule, checking it in an aggregate makes very little sense, because I would need to load the entire set of current emails in the system to a memory and check the email in the command against the aggregate (Eeeek! Something, something, performance.). Because of that, I would move this check to the command validator, which would take UserRepository as a dependency and use the repository to check whether a user with the email present in the command already exists.

When it comes to this it suddenly makes sense to put the other two email rules in the command validator as well. But I have a feeling the rules should be really present within a User aggregate and that the command validator should only check about the uniqueness and if validation succeeds I should proceed to create the User aggregate in the CreateUserCommandHandler and pass it to a repository to be saved.

I feel like this because the repository's save method is likely to accept an aggregate which ensures that once the aggregate is passed all invariants are fulfilled. When the logic (e.g. the non-emptiness) is only present within the command validation itself another programmer could completely skip this validation and call the save method in the UserRepository with a User object directly which could lead to a fatal database error, because the email might have been too long.

How do you personally handle these complex validations and transformations? I am mostly happy with my solution, but I feel like I need affirmation that my ideas and approaches are not completely stupid to be pretty happy with the choices. I am entirely open to completely different approaches. If you have something you have personally tried and worked very well for you I would love to see your solution.

¹ Working as a PHP developer responsible for creating RESTful systems my interpretation of CQRS deviates a little from the standard async-command-processing approach, such as sometimes returning results from commands due to the need of processing commands synchronously.

need some example code i think. what do your command objects look like and where do you create them? — Ewan, May 04 '17 at 10:09
@Ewan I will add code samples either later today or tomorrow. Leaving for a trip in a few minutes. — Andy, May 04 '17 at 10:12
Being a PHP programmer I suggest to take a look at my CQRS+ES implementation: https://github.com/xprt64/cqrs-es — Constantin Galbenu, May 04 '17 at 11:22
@ConstantinGALBENU Should we consider Greg Young's interpretation of CQRS to be right (which we probably should) then your understanding of CQRS is wrong - or at least your PHP implementation is. Commands are not to be handled by aggregates directly. Commands are to be handled by command handlers which may produce changes in aggregates which then produce events to be used for state replications. — Andy, May 05 '17 at 17:36
I don't think our interpretations are different. You just have to dig more into DDD (at the tactical level of Aggregates) or open your eyes wider. There are at least two styles of implementing CQRS. I use one of them. My implementation resembles more to the Actor model and make the Application layer very thin, which is always a good thing. I observed that there is a lot of code duplication inside those app services and decided to replace them with a `CommandDispatcher`. — Constantin Galbenu, May 05 '17 at 17:47
@ConstantinGALBENU The problem I see in your approach is represented by the question: Where do the aggregates come from? If you are applying commands directly from them then how does the aggregate's current version find its way into your application layer? In Greg's interpretation this is done in command handlers having references to repositories using which the aggregate is loaded. This logic is completely absent in your examples. — Andy, May 05 '17 at 18:00
The my documentation sucks. The aggregate is loaded from a repository every time a command reaches it. It is the CommandDispatcher that uses a Repository to load the Aggregate. The Repository loads all the previous events, apply them on the aggregate (after it creates a `new` instance of it) then it sends the command to it. Then it collects the `yielded` events and persist them in the repo using optimistic locking and version checking. Please note that I don't use inheritance in any of my domain classes. This is huge. It keeps my objects clean. — Constantin Galbenu, May 05 '17 at 18:06
@ConstantinGALBENU Can you come [here](http://chat.stackexchange.com/rooms/58272/cqrs-discussion-between-david-and-constantin)? I am interested in your approach, it will be easier to discuss it there. Thanks! — Andy, May 05 '17 at 18:10
Other frameworks forces you to inherit from a AbstractAggregateRoot just to be able to extract the events from your aggregate. I don't. I also instantly apply the yielded event back onto the aggregate; in this way you can have very complex state based algorithms with simple code. — Constantin Galbenu, May 05 '17 at 18:11
I adopted the cqrs.nu style after I saw that *all* the command handlers have the same pattern: they load the aggregate from the repo, then call a method on it, then collect the events, then persist them in the repo. **Every** handler is the same. Why not extract this pattern into a class? Also, in PHP we can use reflection to detect the aggregate's command handlers and call the automatically when a command is dispatched - no more manual subscribing to commands. Again, we can use reflection to automatically subscribe read-models and sagas to events - no more manual subscribing to events. — Constantin Galbenu, May 05 '17 at 18:23
[Jimmy Bogard](https://jimmybogard.com) recently blogged about this [here](https://jimmybogard.com/domain-command-patterns-validation/) and [here](https://jimmybogard.com/domain-command-patterns-handlers/) — devnull, May 05 '17 at 19:40

Constantin Galbenu · Accepted Answer · 2018-02-24T08:22:07.133

30

The following answer is in the context of the CQRS style promoted by the cqrs.nu in which commands arrive directly on the aggregates. In this architectural style the application services are being replaced by an infrastructure component (the CommandDispatcher) that identifies the aggregate, loads it, sends it the command and then persists the aggregate (as a series of events if Event sourcing is used).

So the real question is: Where exactly is the logic?

There are multiple kinds of (validation) logic. The general idea is to execute the logic as early as possible - fail fast if you want. So, the situations are as follows:

the structure of the command object itself; the command's constructor has some required fields that must be present for the command to be created; this is the first and fastest validation; this is obviously contained in the command.
low level field validation, like the non-emptiness of some fields (like the username) or the format (a valid email address). This kind of validation should be contained inside the command itself, in the constructor. There is another style of having an isValid method but this seems pointless to me as someone would have to remember to call this method when in fact successful command instantiation should suffice.
separate command validators, classes that have the responsibility to validated a command. I use this kind of validation when I need to check information from multiple aggregates or external sources. You could use this to check the uniqueness of an username. Command validators could have any dependencies injected, like repositories. Keep in mind that this validation is eventually consistent with the aggregate (i.e. when the user gets created, another user with the same username could be created in the meantime)! Also, do not try to put here logic that should reside inside the aggregate! Command validators are different from the Sagas/Process managers which generate commands based on events.
the aggregate methods that receive and process the commands. This is the last (kind of) validation that occurs. The aggregate extract the data from the command and using some core business logic it accepts (it performs changes to it's state) or rejects it. This logic is checked in a strong consistent manner. This is the last line of defense. In your example, the rule When a car uses Electric engine the only allowed transmission type is Automatic should be checked here.

I feel like this because the repository's save method is likely to accept an aggregate which ensures that once the aggregate is passed all invariants are fulfilled. When the logic (e.g. the non-emptiness) is only present within the command validation itself another programmer could completely skip this validation and call the save method in the UserRepository with a User object directly which could lead to a fatal database error, because the email might have been too long.

Using the above techniques nobody can create invalid commands or bypass the logic inside the aggregates. Command validators are automatically loaded+called by the CommandDispatcher so nobody can send a command directly to the aggregate. One could call a method on the aggregate passing a command but could not persist the changes so it would be pointless/harmless to do so.

Working as a PHP developer responsible for creating RESTful systems my interpretation of CQRS deviates a little from the standard async-command-processing approach, such as sometimes returning results from commands due to the need of processing commands synchronously.

I'm also a PHP programmer and I don't return anything from my command handlers (aggregate methods in the form handleSomeCommand). I do, however, quite often, return information to the client/browser in the HTTP response, for example the ID of the newly created aggregate root or something from a read-model but I never return (really never) anything from my aggregate command methods. The simple fact that the command was accepted (and processed - we are talking about synchronous PHP processing, right?!) is sufficient.

We return something to the browser (and still doing CQRS by the book) because CQRS is not a high level architecture.

An example of how command validators work:

edited Feb 24 '18 at 08:22

answered May 04 '17 at 12:19

Constantin Galbenu

3,242
12
16

In regard to your validation strategy, point number two jumps out at me as a likely place where logic will be duplicated often. Certainly one would want the User aggregate to validate a non-empty and well-formed email as well no? This becomes apparent when we introduce a ChangeEmail command. – user3347715 Feb 23 '18 at 17:32
@king-side-slide not if you have an `EmailAddress` value object that validate itself. – Constantin Galbenu Feb 23 '18 at 17:34
That is entirely correct. One could encapsulate an `EmailAddress` in order to reduce duplication. More importantly though, in doing so you would also be moving the logic out of your command and into your domain. It's worth noting that this can be taken too far. Often similar pieces of knowledge (value objects) may have different validation requirements depending on whose using them. `EmailAddress` is a convenient example because the entire conception of this value has global validation requirements. – user3347715 Feb 23 '18 at 17:39
Similarly, the idea of a "command validator" seems unnecessary. The goal isn't to prevent invalid commands from being created and dispatched. The goal is to prevent them from executing. For example, I can pass any data I want with a URL. If it's invalid, the system rejects my request. The command is still created and dispatched. Should a command require multiple aggregates for validation (i.e. a collection of Users to check for email uniqueness), a domain service is a better fit. Objects like "x validator" are often a sign of an anemic model where the data is being separated from the behavior. – user3347715 Feb 23 '18 at 18:49
@king-side-slide please keep in mind that there are a multitudine of cases, architectures and styles. A command validator is a Service in my architecture, that checks a command in an eventually consystent manner. For example, to check for authorization in a monolith. This kind of check is done outside the transactional boundary. – Constantin Galbenu Feb 23 '18 at 18:59
I understand that cutting through the complexity of a large system can manifest in many ways, but an object solely responsible for validating the state of something else is rarely a good solution (except possibly a *generated* object as part of the implementation of a larger system). The utility of any object is predicated on it's behavior. If all a "command validator" does is check if a command will fail after it's been dispatched, it is redundant. It simply ends up as a place for duplication and unnecessary coupling. Separating data from behavior is the antithesis of DDD. It's procedural. – user3347715 Feb 23 '18 at 20:43
@king-side-slide the command validator builds and maintains its own state from the events comming from other aggregate types. It's behavior and its needed state, exactly as it should be. Try to think out of the box as you don't have all the data. – Constantin Galbenu Feb 23 '18 at 21:10
@king-side-slide also, every command validator has only one responsibility, to reject a specific command for a single reason. So each command, before reaches an Aggregate, can be rejected by one of its out-of-bounded-context command validators. – Constantin Galbenu Feb 23 '18 at 21:24
Can you give me a concrete example? I'm interested in how this is implemented. Here's the flow I'm imagining: a `CreateUserValidator` is initialized (on startup?) that subscribes to *multiple* events which is then invoked with a `CreateUser` command to which it provides validation according the the events it previously received. What kind of events could it receive before the command reaches it? A `VisitorAcceptedPrivacyStatement' event? How is it not deterministic in nature? It seems like that information could simply be included on the command itself. This sounds like a distributed ... – user3347715 Feb 23 '18 at 21:58
... service where, instead of having application Services which are injected with outside information (repositories etc.) that have methods which handle commands by orchestrating domain objects, you've *separated* and isolated each of these methods into their own classes named "command validators". Functionally, these seem like similar concepts (save the whole event subscribing bit). I think you've reached a hybrid between a process manager and an application service. Process managers are perfectly capable of rejecting commands (events) according to the state of the process it's managing – user3347715 Feb 23 '18 at 22:05
1

@king-side-slide A concrete example is `UserCanPlaceOrdersOnlyIfHeIsNotLockedValidator`. You can see that this is a separate domain that of the Orders so it can't be validated by the OrderAggregate itself. – Constantin Galbenu Feb 24 '18 at 00:44
Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/73616/discussion-between-constantin-galbenu-and-king-side-slide). – Constantin Galbenu Feb 24 '18 at 00:52
What is it about that rule that puts it in a separate domain? What data does your validator require? Is there nowhere in your domain where that data is naturally grouped? Why not? It seems like a rule like that would be grouped with whatever cross-section of your `User` can place orders (`Buyer`?). I have found that rules like your example are usually found in an anemic model. Where the code base is largely built in a procedural paradigm modeled as objects. Functions are elevated to objects and data is encapsulated and passed around for processing. Objects generally have lots of getters. – user3347715 Feb 24 '18 at 05:01
@king-side-slide I respond only in chat. – Constantin Galbenu Feb 24 '18 at 07:25
@Constantin Galbenu, does it not make the domain model anemic by putting the validation logic in the command? Do you know of any open source examples where this is done? Thanks. +1. – w0051977 Jun 05 '18 at 07:25
@w0051977 No, because the Command is from the Domain, it lives in the Domain (i.e. it should be declared in the same namespace as the Aggregate that handles it). The Command takes some low level responsibilities from the Aggregate so that the Aggregate could focus on the high level business logic. – Constantin Galbenu Jun 05 '18 at 07:28
@w0051977 I see the "domain model" mainly as the Aggregate. The Aggregate contains the Aggregate root and all the commands and events and value objects that it owns. – Constantin Galbenu Jun 05 '18 at 07:29
@Constantin Galbenu, do you know of any code samples where this is done? I have looked here, however there appears to be no validation: https://github.com/gregoryyoung/m-r/tree/master/SimpleCQRS – w0051977 Jun 05 '18 at 07:32
@w0051977 I have an example but in PHP: https://github.com/xprt64/todosample-cqrs-es/blob/master/src/Domain/Write/Todo/TodoAggregate/Command/AddNewTodo.php – Constantin Galbenu Jun 05 '18 at 07:34
@Constantin Galbenu, following on from a lot of research recently I have asked a similar question here: https://softwareengineering.stackexchange.com/questions/372118/where-should-i-put-validation-logic-when-using-cqrs in case you would like to answer. – w0051977 Jun 05 '18 at 15:55
@w0051977 I've already answered it here, in the comments above. As a note: the Aggregate may delegate some low level validation to its Commands, especially initialization validation. Also, the principle here is Fail fast. – Constantin Galbenu Jun 05 '18 at 16:01
@Constantin Galbenu, is the command object the same as a domain object in you mind? If not then may I ask why the entity is not responsible for its validation? – w0051977 Jun 06 '18 at 12:04
@w0051977 testing for command's validity should be the responsibility of the command itself. The Aggregate roots and nested entities have higher level responsibilities like checking the domain invariants. I consider the Command part of the domain layer so you could say that is a kind of domain object. – Constantin Galbenu Jun 06 '18 at 14:45
@Constantin Galbenu, In your GitHub example what would happen if another developer subclassed AddNewTodo (maybe externally to your domain project) and did not bother adding any validation. Would this not break the Always Valid rule? – w0051977 Jun 08 '18 at 10:31
@w0051977 The `TodoAggregate` handles only commands for which it has subscribed (in my case all commands that are parameters to `handleXXX` methods because the registration is done automatically). Any subclass of `AddNewTodo` is pointless and sending such a command would generate an `CommandHandlerNotFound` exception. – Constantin Galbenu Jun 08 '18 at 11:57
@Constantin Galbenu, I believe your answer is supported by MikeSW in his comment here: https://stackoverflow.com/questions/30190302/what-is-the-difference-between-invariants-and-validation-rules when he says: "Validation usually handles data formats which can be part of a business rule. As a thumb rule: you use validation to ensure the input data is in valid format, then the business rules decide how/if the input changes a model". Is that right? – w0051977 Jun 08 '18 at 14:25
@w0051977 so it seems – Constantin Galbenu Jun 08 '18 at 14:28
@Constantin Galbenu, I have asked another question here: https://softwareengineering.stackexchange.com/questions/372338/zero-arguement-constructors-and-always-valid-entities. I would be grateful for your comments/answer if you would like to. I now agree that validation should go in the command. – w0051977 Jun 10 '18 at 09:50

user3347715 · Answer 2 · 2018-02-22T21:16:29.937

One fundamental premise of DDD is that domain models validate themselves. This is a critical concept because it elevates your domain as the responsible party for making sure your business rules are enforced. It also keeps your domain model as the focus for development.

A CQRS system (as you correctly point out) is an implementation detail representing a generic sub-domain which implements it's own cohesive mechanism. Your model should in no way depend on any piece of CQRS infrastructure to behave according to your business rules. The goal of DDD is to model the behavior of a system such that the result is a useful abstraction of the functional requirements of your core business domain. Moving any single piece of this behavior out of your model, however tempting, is reducing the integrity and cohesion of your model (and making it less useful).

Simply by extending your example to include a ChangeEmail command, we can perfectly illustrate why you don't want any of your business logic in your command infrastructure as you would need to duplicate your rules:

email cannot be empty
email cannot be longer than 100 chars
email must be unique

So now that we can be sure our logic needs to be in our domain, let's tackle the issue of "where". The first two rules can be easily applied to our User aggregate, but that last rule is a bit more nuanced; one that requires some further knowledge crunching to gain some deeper insight. At the surface, it may seem like this rule applies to a User, but it really doesn't. The "uniqueness" of an email applies to a collection of Users (according to some scope).

Ah ha! With that in mind, it becomes abundantly clear that your UserRepository (your in-memory collection of Users) may be a better candidate for enforcing this invariant. The "save" method is likely the most reasonable place to include the check (where you can throw a UserEmailAlreadyExists exception). Alternatively, a domain UserService could be made responsible for creating new Users and updating their attributes.

Fail fast is a good approach, but can only be done where and when it fits in with the rest of the model. It can be extremely tempting to check parameters on an application service method (or command) before processing further in an attempt to catch failures when you (the developer) know the call will fail somewhere deeper in the process. But in doing so, you will have duplicated (and leaked) knowledge in a way that will likely require more than one update to the code when the business rules change.

I agree with this. My reading up until now (without CQRS) tells me that validation should always go in the domain model to protect the invariants. Now I am reading CQRS it is telling me to put the validation in the Command objects. This seems counter intuitive. Do you know of any examples e.g. on GitHub where the validation is put in the Domain Model instead of the Command? +1. — w0051977, Jun 05 '18 at 09:14

How exactly should a CQRS Command be validated and transformed to a domain object?

2 Answers2

Linked