Is domain/persistence model isolation usually this awkward?

Question

I'm diving into the concepts to Domain-Driven Design (DDD) and found some principles strange, especially regarding the isolation of domain and persistence model. Here is my basic understanding:

A service on the application layer (providing a feature set) requests domain objects from a repository it needs to carry out its function.
The concrete implementation of this repository fetches data from the storage it was implemented for
The service tells the domain object, which encapsulates business logic, to perform certain tasks which modifies its state.
The service tells the repository to persist the modified domain object.
The repository needs to map the domain object back to the corresponding representation in storage.

Now, given the above assumptions, the following seems awkward:

Ad 2.:

The domain model seems to load the entire domain object (including all fields and references), even if they are not needed for the function that requested it. Loading entirely might not even be possible if other domain objects are referenced, unless you load those domain objects as well and all the objects they reference in turn, and so on and so forth. Lazy loading comes to mind, which however means that you start querying your domain objects which should be the responsibility of the repository in the first place.

Given this problem, the "correct" way of loading domain objects seems to be having a dedicated loading function for each use case. These dedicated functions would then only load the data required by the use case they were designed for. Here's where the awkwardness comes into play: First, I would have to maintain a considerable amount of loading functions for each implementation of the repository, and domain objects would end up in incomplete states carrying null in their fields. The latter should technically not be a problem because if a value was not loaded, it should not be required by the functionality that requested it anyway. Still it's awkward and a potential hazard.

Ad 3.:

How would a domain object verify uniqueness constrains upon construction if it does not have any notion of the repository? For instance, if I wanted to create a new User with a unique social security number (which is given), the earliest conflict would occur upon asking the repository to save the object, only if there is a uniqueness constraint defined on the database. Otherwise, I could look for a User with the given social security and report an error in case it exists, before creating a new one. ~~But then the constraint checks would live in the service and not in the domain object where they belong.~~ I just realised that the domain objects are very well allowed to use (injected) repositories for validation.

Ad 5.:

I perceive the mapping of domain objects to a storage backend as a work-intensive process in comparison to having the domain objects modify the underlaying data directly. It is, of course, an essential prerequisite to decouple the concrete storage implementation from the domain code. However, does it indeed come at such a high cost?

You apparently have the option to use ORM tools to do the mapping for you. These would often require you to design the domain model according to the ORM's restrictions, however, or even introduce a dependency from the domain to infrastructure layer (by using ORM annotations in the domain objects, for instance). Also I've read that ORMs introduce a considerable computational overhead.

In the case of NoSQL databases, for which hardly any ORM-like concepts exist, how do you keep track of which properties changed in the domain models upon save()?

Edit: Also, in order for a repository to access the domain object's state (i.e. the value of each field), the domain object needs to reveal its internal state which breaks encapsulation.

In general:

Where would transactional logic go? This is certainly persistence specific. Some storage infrastructure might not even support transactions at all (like in-memory mock repositories).
For bulk operations that modify multiple objects, would I have to load, modify and store each object individually in order to go through the object's encapsulated validation logic? That is opposed to executing a single query directly onto the database.

I would appreciate some clarification on this topic. Are my assumptions correct? If not, what is the correct way of tackling these problems?

Good points and questions, I am also interested in those. One side note - if you are properly modeling the aggregate that means that at any given time of existence, the aggregate instance must be in valid state - that is the main point of the aggregate (and not using an aggregate as a composition container). That also means that in order to restore the aggregate form the DB data, the repository itself would usually have to use specific constructor and set of mutation operations, and I don't see how any ORM could auto-magically know how to do those operations. — Dusan, Oct 14 '18 at 16:10
What is even more disappointing is that those questions like yours are asked around pretty often, but, to my knowledge - there are ZERO examples of the implementation of the aggregates and repositories that are by the book — Dusan, Oct 14 '18 at 16:19

Ewan · Accepted Answer · 2018-10-14T17:32:59.540

Your basic understanding is correct and the architecture you sketch out is good and works well.

Reading between the lines it seems like you are coming from a more database-centric active record style of programming? To get to a working implementation I would say you need to

1: Domain objects don't have to include the whole object graph. For example I could have:

public class Customer
{
    public string AddressId {get;set;}
    public string Name {get;set;}
}

public class Address
{
    public string Id {get;set;}
    public string HouseNumber {get;set;
}

Address and Customer need only be part of the same aggregate if you have some logic such as, "the customer name can only start with the same letter as the housename". You are right to avoid lazy loading and 'Lite' versions of objects.

2: Uniqueness constraints generally are the purview of the repository not the domain object. Don't inject repositories into Domain Objects, that's a move back to active record, simply error when the service attempts to save.

The business rule isn't "No two instances of User with the same SocialSecurityNumber can exist at the same time ever"

It's that they cant exist on the same repository.

3: It's not hard to write repositories rather than individual property update methods. In fact, you'll find that you have pretty much the same code either way. Its just which class you put it in.

ORMs these days are easy and have no extra constraints on your code. Having said that, personally I prefer to simply hand crank the SQL. It's not that hard, you never run into any issues with ORM features and you can optimise where required.

There really is no need to keep track of which properties changed when you saved. Keep your Domain Objects small and simply overwrite the old version.

General Questions

Transaction logic goes in the repository. But you shouldn't have much if any of it. Sure you need some if you have child tables in which you are putting the child objects of the aggregate, but that will be entirely encapsulated within the SaveMyObject repository method.
Bulk updates. Yes you should individually alter each object then just add a SaveMyObjects(List objects) method to your repository, to do the bulk update.

You want the Domain Object or Domain Service to contain the logic. Not the database. That means you can't just do "update customer set name=x where y", because for all you know the Customer object, or CustomerUpdateService does 20 odd other things when you change the name.

Great answer. You are absolutely right, I am used to an active record style of coding, which is why the repository pattern seems odd at first sight. However, don't the "lean" domain objects (`AddressId` instead of `Address`) contradict OO principles? — Double M, Oct 14 '18 at 17:38
nope, you still have an Address object, its just not a child of Customer — Ewan, Oct 14 '18 at 18:43
ad object mapping without change tracking https://softwareengineering.stackexchange.com/questions/380274/updating-aggregates-without-orm — Double M, Oct 19 '18 at 13:06

Robert Bräutigam · Answer 2 · 2018-10-16T03:32:21.783

Short answer: Your understanding is correct and the questions you have point to valid problems for which the solutions are not straight forward nor universally accepted.

Point 2.: (loading full object graphs)

I'm not the first to point out that ORMs are not always a good solution. The main problem being that ORMs know nothing about the actual use-case, so they have no idea what to load or how to optimize. This is a problem.

As you said, the obvious solution is to have persistence methods for each use-case. But if you still use an ORM for that, the ORM will force you to pack everything into data-objects. Which apart from not really being object-oriented, is again not the best design for some use-cases.

What if I just want to bulk-update some records? Why would I need an object representation for all records? Etc.

So the solution to that is just not to use an ORM for use-cases for which it is not a good fit. Implement a use-case "naturally" as it is, which sometimes does not require an additional "abstraction" of the data itself (data-objects) nor an abstraction over the "tables" (repositories).

Having half-filled data objects or replacing object references with "ids" are workarounds at best, not good designs, as you pointed out.

Point 3.: (checking constraints)

If the persistence is not abstracted out, each use-case can obviously check whatever constraint it wants easily. The requirement that objects don't know the "repository" is completely artificial and not a problem of the technology.

Point 5.: (ORMs)

It is, of course, an essential prerequisite to decouple the concrete storage implementation from the domain code. However, does it indeed come at such a high cost?

No, it doesn't. There are a lot of other ways to have persistence. The problem is that the ORM is seen as "the" solution to use, always (for relational databases at least). Trying to suggest not to use it for some use-cases in a project is futile and depending on the ORM itself sometimes even impossible, since caches and late-execution are sometimes used by these tools.

General question 1.: (transactions)

I do not think there is a single solution. If your design is object-oriented, there will be a "top" method for each use-case. The transaction should be there.

Any other restriction is completely artificial.

General question 2.: (bulk operations)

With an ORM, you are (for most ORMs I know) forced to go through individual objects. This is completely unnecessary and probably would not be your design if your hand would not be tied by the ORM.

The requirement to separate "logic" from SQL comes from the ORMs. They have to say that, because they can't support it. It is not inherently "bad".

Summary

I guess my point is that ORMs are not always the best tool for a project, and even if it is, it's highly unlikely to be the best for all use-cases in a project.

Similarly, the dataobject-repository abstraction of DDD is not always the best either. I would even go so far to say, they are rarely the optimal design.

That leaves us with no one-size-fits-all solution, so we would have to think about solutions for each use-case individually, which is not good news and obviously makes our work more difficult :)

Very interesting points in there, thank you for confirming my assumptions. You said there are a lot of other ways to have persistence. Can you recommend a performant design pattern to be used with graph databases (no ORM), which would still provide PI? — Double M, Oct 18 '18 at 11:36
I would actually question whether you need isolation (and what kind) in the first place. Isolating *per technology* (i.e. database, ui, etc.) brings almost automatically the "awkwardness" you are trying to avoid, for the advantage of somewhat easier replacement of the database technology. The cost is however a more difficult change of business logic since that spreads through the layers. *Or* you can split along business functions, which would make changing databases harder, but changing logic easier. Which one do you really want? — Robert Bräutigam, Oct 18 '18 at 12:19
You can get the best performance if you just model the domain (i.e. business functions), and don't abstract the database (whether relational or graph does not matter). Since the database is not abstracted from the use-case, the use-case can implement the most optimal queries/updates it wants, and doesn't need to go through some awkward object-model to accomplish what it wants. — Robert Bräutigam, Oct 18 '18 at 12:21
Well, the main goal is to keep the concerns of persistence away from the business logic, in order to have clean code that is easy to understand, expand and test. Being able to swap DB technologies is just a bonus. I can see that there is obviously friction between efficiency and ignorance, which seems to be stronger with graph DBs due to the powerful queries you can (but aren't allowed to) use. — Double M, Oct 18 '18 at 12:28
As a Java Enterprise developer I can tell you we've tried to separate persistence from logic for the last two decades. It doesn't work. First, separation was never really achieved. Even today, there is all sorts of database-related stuff in supposedly "business" objects, the main one being the database id (and a lot of database annotations). Second, as you said, sometimes business logic is executed on the database either way. Third, that is the reason we have specific databases, to be able to offload some logic best done where the data is. — Robert Bräutigam, Oct 18 '18 at 13:03

Is domain/persistence model isolation usually this awkward?

2 Answers2