56

I'm diving in to Domain Driven Design and some of the concepts i'm coming across make a lot of sense on the surface, but when I think about them more I have to wonder if that's really a good idea.

The concept of Aggregates, for instance makes sense. You create small domains of ownership so that you don't have to deal with the entire domain model.

However, when I think about this in the context of a web app, we're frequently hitting the database to pull back small subsets of data. For instance, a page may only list the number of orders, with links to click on to open the order and see its order id's.

If i'm understanding Aggregates right, I would typically use the repository pattern to return an OrderAggregate that would contain the members GetAll, GetByID, Delete, and Save. Ok, that sounds good. But...

If I call GetAll to list all my order's, it would seem to me that this pattern would require the entire list of aggregate information to be returned, complete orders, order lines, etc... When I only need a small subset of that information (just header information).

Am I missing something? Or is there some level of optimization you would use here? I can't imagine that anyone would advocate returning entire aggregates of information when you don't need it.

Certainly, one could create methods on your repository like GetOrderHeaders, but that seems to defeat the purpose of using a pattern like repository in the first place.

Can anyone clarify this for me?

EDIT:

After a lot more research, I think the disconnect here is that a pure Repository pattern is different from what most people think of a Repository as being.

Fowler defines a repository as a data store that uses collection semantics, and is generally kept in-memory. This means creating an entire object graph.

Evans alters the Repository to include Aggregate Roots, and thus the repository is amputated to only support the objects in an Aggregate.

Most people seem to think of repositories as glorified Data Access Objects, where you just create methods to get whatever data you want. That doesn't seem to be the intent as described in Fowler's Patterns of Enterprise Application Architecture.

Still others think of a repository as a simple abstraction used primarily to make testing and mocking easier, or to decouple persistence from the rest of the system.

I guess the answer is that this is a much more complex concept than I first thought it was.

Erik Funkenbusch
  • 2,768
  • 3
  • 22
  • 27
  • 4
    "I guess the answer is that this is a much more complex concept than I first thought it was." This is **very** true. – quentin-starin Feb 16 '11 at 22:35
  • for your situation, you might create a proxy for the aggregate root object that selectively retrieves and caches data only when it is requested – Steven A. Lowe Jun 03 '11 at 15:55
  • My suggestion is to implement lazy load in the root aggregate associations. So you can retrieve a list of roots without loading too many objects. – Juliano Feb 19 '15 at 16:35
  • 5
    almost 6 years later, still a good question. After reading the chapter in the red book, I'd say: don't make your aggregates too big. It's tempting to pick some top-level concept from your domain and declare it the Root to Rule Them All but DDD advocates smaller aggregates. And mitigates inefficiencies like the ones you describe above. – Cpt. Senkfuss Nov 20 '16 at 21:35
  • Your aggregates should be as small as possible while being natural and effective to the domain (a challenge!). Furthermore, it is perfectly fine and desirable for your repos to have highly *specific* methods. – Timo Mar 18 '17 at 16:35

6 Answers6

38

Don't use your Domain Model and aggregates for querying.

In fact, what you are asking is a common enough question that a set of principles and patterns has been established to avoid just that. It is called CQRS.

CodesInChaos
  • 5,697
  • 4
  • 19
  • 26
quentin-starin
  • 5,800
  • 27
  • 26
  • Interesting, I have not found mention of CQRS in any of the typical DDD sources... I'll look into it. – Erik Funkenbusch Feb 14 '11 at 06:21
  • So, after googling a bit on this, it doesn't seem that CQRS is intended to solve the problem i'm referring to. CQRS is about seperating command and query operations, and removing state. It seems more like CQRS is designed to solve various consistency problems, not providing the minimal data needed for an operation (which is basically what i'm discussing) – Erik Funkenbusch Feb 14 '11 at 06:40
  • 2
    @Mystere Man: No, it **is** for providing the minimal data needed. That's one of the large purposes of a separated read model. It also helps tackle some concurrency issues up front. CQRS has several benefits when applied to DDD. – quentin-starin Feb 14 '11 at 14:31
  • The purpose of the read side is to have exactly the data you need for each individual task, easy and efficiently available. Also, CQRS can be implemented with fully synchronous, in-process buses, so that eventual consistency doesn't really come into play. – quentin-starin Feb 14 '11 at 14:32
  • 1
    Can you provide any references that discuss that? The one you provided says nothing of the sort, and none of the other references i've found mention it either. – Erik Funkenbusch Feb 14 '11 at 16:36
  • 3
    @Mystere: I'm quite surprised you'd miss that if you read the article I linked to. See the section titled "Queries (reporting)": "When an application is requesting data ... this should be done in one single call to the Query layer and in return it will get one single DTO containing all the needed data. ... The reason for this is that data is normally queried many times more than domain behavior is being executed, so by optimizing this you will enhance the perceived performance of the system." – quentin-starin Feb 14 '11 at 16:41
  • Seriously, it's a central tenet of CQRS. What has to happen when we modify data is vastly different from what has to happen to read data, so CQRS (like CQS before it) aims to separate these concerns. The form CQRS often takes seeks to make this separation at a broad architectural level. Perhaps the CQRS Info page here on SO would help clarify for you what it is and is not: http://stackoverflow.com/tags/cqrs/info – quentin-starin Feb 14 '11 at 16:43
  • 1
    I guess your familiarity with the subject allows you to infer meaning that my unfamiliarity does not. I don't see the the text you quoted as meaning the same thing you believe it does. "all the needed data" is not the same thing as "only the needed data". The way I see repositories working, they return all the data from the aggregate, not just the data you need. – Erik Funkenbusch Feb 14 '11 at 16:49
  • 5
    Which is why we wouldn't use a Repository to *read* data in a CQRS system. We'd write a simple class to encapsulate a query (using whatever tech was convenient or needed, often straight ADO.Net or Linq2Sql or SubSonic fits well here), returning just (all) the data needed for the task at hand, to avoid dragging data through all the normal layers of a DDD Repository. We would only use a Repository to retrieve an Aggregate if we wanted to send a command to the domain. – quentin-starin Feb 14 '11 at 16:52
  • 1
    @qstarin- I disagree with the class per query approach for a lot of reasons. First, it creates a large amount of boilerplate code that you have to rewrite or copy (or generate). Second, it leads to an explosion of classes and I don't see how having 50 classes that do variations of the same thing are any better than 1 class with 50 methods. Even taking SRP into account, it seems to add complexity to the system as a whole. I think there's some middle ground, possibly with specifications that might be better. Also, one class per query makes it more difficult to substitute other persistence later. – Erik Funkenbusch Feb 16 '11 at 21:50
  • class per query is an implementation detail. Acknowledging and accepting the fact that a domain model does not usually support queries well is more to the point. Besides, I don't always use one class per query, sometimes it makes sense to group a few related queries together. Also, I find a class per query is less boilerplate than mapping entities to DTO's - the only boilerplate I see is the same set of using's (filled in automatically when I create the class file) and an Execute method. The meat is usually a simple LINQ statement. – quentin-starin Feb 16 '11 at 22:28
  • 10
    "I can't imagine that anyone would advocate returning entire aggregates of information when you don't need it." I'm trying to say that you are exactly correct with this statement. Do not retrieve an entire aggregate of information when you do not need it. This is the very core of CQRS applied to DDD. You don't need an aggregate to query. Get the data through a different mechanism, and then do that consistently. – quentin-starin Feb 16 '11 at 22:37
  • 3
    @qes Indeed, the best solution is to not use DDD for queries (read) :) But you still use DDD in Command part, i.e. for storing or updating data. So I have a question for you, do you always use Repositories with Entities when you need to update data in the DB? Lets say you need to change only one small value in the column (switch of some sort), do you still load whole Entity in the App layer, change one value (property) and then save whole Entity back into DB? A bit of overkill, too? – Andrew Jul 11 '15 at 09:43
  • @Andrew Depends on whether the small value change is relevant to the business rules. If it is, then yes, you load the entire Entity to enforce the rules and fire subsequent events etc. If not, then it's a CRUD-operation where DDD is not required and the value propably doesn't even belong in the domain model in the first place. Remember to use the right tool for the job. DDD is not for everything and everywhere. That's what CQRS is about and separating the CRUD-operations as well. It's perfectly fine to use DDD for one bounded context and pure CRUD for another within the same application. – Tuukka Haapaniemi Sep 04 '15 at 13:06
12

I struggled, and am still struggling, with how to best use the repository pattern in a Domain Driven Design. After using it now for the first time, I came up with the following practices:

  1. A repository should be simple; it is only responsible for storing domain objects and retrieving them. All other logic should be in other objects, like factories and domain services.

  2. A repository behaves like a collection as if it's an in memory collection of aggregate roots.

  3. A repository is not a generic DAO, each repository has its unique and narrow interface. A repository often has specific finder methods that allow you to search the collection in terms of the domain (for example: give me all open orders for user X). The repository itself can be implemented with the help of a generic DAO.

  4. Ideally the finder methods will return only aggregate roots. If that's to inefficient it can also return read only value objects than contain exactly what you need (although it’s a plus if these value objects can also be expressed in terms of the domain). As a last resort the repository can also be used to return subsets or collections of subsets of an aggregate root.

  5. Choices like these depend on the technologies used, as you need to find a way to most efficiently express your domain model with the technologies used.

Kdeveloper
  • 326
  • 2
  • 4
  • Its definitely a complex subject for sure. It's hard to turn theory into practice especially when its combining two distinct and separate theories into a single practice. – Sebastian Patten Sep 03 '14 at 18:54
7

I don't think your GetOrderHeaders method defeats the purpose of the repository at all.

DDD is concerned (among other things) with ensuring that you get what you need by way of the aggregate root (you wouldn't have a OrderDetailsRepository, for instance), but it doesn't limit you in the way you are mentioning.

If an OrderHeader is a Domain concept, then you should have it defined as such and have the appropriate repository methods for retrieving them. Just make sure that you're going through the correct aggregate root when you do.

Eric King
  • 10,876
  • 3
  • 41
  • 55
  • Perhaps i'm confusing concepts here, but my understanding of the repository pattern is to decouple the persistence from the domain, by use of a standard interface for persistence. If you have to add custom methods for a specific feature, that seems to to be coupling things back up again. – Erik Funkenbusch Feb 14 '11 at 06:19
  • 1
    The persistence *mechanism* is decoupled from the domain, but not what is being persisted. If you find yourself saying things like "we need to list the Order Headers here", then you need to model OrderHeader in your Domain and provide a way to retrieve them from your repository. – Eric King Feb 14 '11 at 13:23
  • 1
    Also, don't get hung up on the "standard interface for persistence". There's no such thing as a generic repository pattern that will suffice for all possible apps. Each app will have many repository methods beyond the standard "GetById", "Save", etc. Those methods are the starting point, not the end point. – Eric King Feb 14 '11 at 13:27
6

Your domain model contains your business logic in its purest form. All the relationships and operations that support business operations. What you're missing from your conceptual map is the idea of the Application Service Layer the service layer wraps around the domain model and provides a simplified view of the business domain (a projection if you will) that allows the Domain model to change as needed without directly impacting the applications using the service layer.

Going further. The idea of the aggregate is that there is one object, the aggregate root, responsible for maintaining consistency of the aggregate. In your example, the order would be responsible for manipulating its order lines.

For your example, the service layer would expose an operation like GetOrdersForCustomer that would only return what's needed to view a summary listing of the orders (as you call them OrderHeaders).

Finally, the Repository pattern isn't JUST a collection, but also allows for declarative queries. In C# you can use LINQ as the Query Object, or most other O/RMs provide a Query Object specification as well.

A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. (from Fowler's Repository page)

Seeing that you can create queries against the repository, it also makes sense to provide convenience methods that handle common queries. I.e. if you just want the headers of your order, you can create a query that returns just the header and expose it from a convenience method in your repositories.

Hope this helps clarify things.

Michael Brown
  • 21,684
  • 3
  • 46
  • 83
6

My use of DDD may not be considered "pure" DDD but I have adapted the following real world strategies using DDD against a DB data store.

  • A aggregate root has an associated repository
  • The associated repository is only used by that aggregate root (it is not publicly available)
  • A repository can contain query calls (e.g. GetAllActiveOrders, GetOrderItemsForOrder)
  • A service exposes a public subset of the repository and other non-crud operations (e.g. Transfer Money from one bank account to another, LoadById, Search / Find, CreateEntity, etc.).
  • I use the Root -> Service -> Repository stack. A DDD service is only suppose to be used for anything an Entity can't answer itself (e.g. LoadById, TransferMoneyFromAccountToAccount), but in the real world I tend to also stick in other CRUD related services (Save, Delete, Query) even though the root should be able to "answer/perform" these themselves. Note that there is nothing wrong with giving an entity access to another aggregate root service! However, remember you would not include in a service (GetOrderItemsForOrder) but would include that in the repository so that the Aggregate Root can make use of it. Note that a service shouldn't expose any open queries like the repository can.
  • I usually define a Repository abstractly in the domain model (via interface) and provide a separate concrete implementation. I fully define a service in the domain model injecting in a concrete repository for its use.

** You do not have to bring back an entire aggregate. However, if you want more you have to ask the root, not some other service or repository. This is lazy loading and can either be done manually with poor man lazy loading (injecting the appropriate repository/service into the root) or using and ORM that supports this.

In your example, I would probably provide a repository call that brought just the order headers if I wanted to load the details on a separate call. Note that by having an "OrderHeader" we are actually introducing an additional concept into the domain.

Mike Rowley
  • 191
  • 2
0

I know this is an old question but I appear to have come to a different answer.

When I make a Repository it's generally wrapping some cached queries.

Fowler defines a repository as a data store that uses collection semantics, and is generally kept in-memory. This means creating an entire object graph.

Keep those repositories in your servers ram. They're not just pass through objects to the database!

If I'm in a web application with a page listing orders, that you can click on to see details, chances are I'm going to want my order listing page to have details about the orders (ID, Name, Amount, Date) to help a user decide which one they want to look at.

At this point you have two options.

  1. You can query the database and pull back exactly what you need to make the listing, then query again to pull the individual details you'd need to see on the detail page.

  2. You can make 1 query that pulls back all the information and caches it. On the next page request you read from the servers ram instead of the database. If he user hits back or selects the next page you're still making zero trips to the database.

In reality how you implement it is just that, and implementation detail. If my biggest user has 10 orders I probably want to go with option 2. If I'm talking 10,000 orders then option 1 is needed. In both of the above cases and in many other cases I want the repository to hide that implementation detail.

Going forward if I get a ticket to tell the user how much they have spent on orders (aggregated data) in the last month on the order listing page, would I rather write the logic to calculate that in SQL and make yet another round trip to the DB or would you rather calculate it using the data that's already in the servers ram?

In my experience domain aggregates offer huge benefits.

  • They are a huge part code reuse that actually works.
  • They simplify code by keeping your business logic right in the core layer instead of having to drill through an infrastructure layer to get the sql server to do it.
  • They can also dramatically speed up your response times by reducing the number of queries you need to make since you can easily cache them.
  • The SQL that I'm writing is often way more maintainable since I'm often just asking for everything and calculating server side.
WhiteleyJ
  • 111
  • 1