Best practice or design patterns for retrieval of data for reporting and dashboards in a domain-rich application

Question

First, I want to say this seems to be a neglected question/area, so if this question needs improvement, help me make this a great question that can benefit others!

In my experience, there are two sides of an application - the "task" side (where the users interact with the application and it's objects where the domain model lives) and the reporting side, where users get data based on what happens on the task side.

On the task side, it's clear that an application with a rich domain model should have business logic in the domain model and the database should be used mostly for persistence. Separation of concerns, every book is written about it, we know what to do, awesome.

What about the reporting side? Are data warehouses acceptable, or are they bad design because they incorporate business logic in the database and the very data itself? In order to aggregate the data from the database into data warehouse data, you must have applied business logic and rules to the data, and that logic and rules didn't come from your domain model, it came from your data aggregating processes. Is that wrong?

I work on large financial and project management applications where the business logic is extensive. When reporting on this data, I will often have a LOT of aggregations to do to pull the information required for the report/dashboard, and the aggregations have a lot of business logic in them. For performance sake, I have been doing it with highly aggregated tables and stored procedures.

As an example, let's say a report/dashboard is needed to show a list of active projects (imagine 10,000 projects). Each project will need a set of metrics shown with it, for example:

total budget
effort to date
burn rate
budget exhaustion date at current burn rate
etc.

Each of these involves a lot of business logic. And I'm not just talking about multiplying numbers or some simple logic. I'm talking about in order to get the budget, you have to apply a rate sheet with 500 different rates, one for each employee's time (on some projects, other's have a multiplier), applying expenses and any appropriate markup, etc. The logic is extensive. It took a lot of aggregating and query tuning to get this data in a reasonable amount of time for the client.

Should this be run through the domain first? What about performance? Even with straight SQL queries, I'm barely getting this data fast enough for the client to display in a reasonable amount of time. I can't imagine trying to get this data to the client fast enough if I am rehydrating all these domain objects, and mixing and matching and aggregating their data in the application layer, or trying to aggregate the data in the application.

It seems in these cases that SQL is good at crunching data, and why not use it? But then you have business logic outside your domain model. Any change to the business logic will have to be changed in your domain model and your reporting aggregation schemes.

I'm really at a loss for how to design the reporting/dashboard part of any application with respect to domain driven design and good practices.

I added the MVC tag because MVC is the design flavor du jour and I am using it in my current design, but can't figure out how the reporting data fits into this type of application.

I'm looking for any help in this area - books, design patterns, key words to google, articles, anything. I can't find any information on this topic.

Often times abstraction comes at some cost of performance. In this case I would view a data warehouse as a highly optimized version of function memoization: derived pieces of data that you need for reporting are cached away and fetched as you need them. It still seems like the code that generates the data warehouse tables may logically make sense to group with some of the reporting libraries, since they represent working with the same abstractions. Changing two things in one area of code because of different domains seems ok-ish (a necessary sacrifice for performance), but not two areas. — J Trana, Jan 23 '14 at 06:05
Please do not cross-post. See https://programmers.stackexchange.com/questions/225145/best-practice-or-design-patterns-for-retrieval-of-data-for-reporting-and-dashboa , https://programmers.stackexchange.com/questions/225153/best-practice-or-design-patterns-for-retrieval-of-data-for-reporting-and-dashboa — phresnel, Jan 23 '14 at 07:22
Ummmmmm...@phresnel The mod on SO told me that I should post this here because he said the other one couldn't be migrated for whatever reason. Here is what he said in my inbox - ah, and of course it has now dissappeared from my inbox. Not sure what's up with that. AND it looks like they were able to migrate the post to programmers. Strange. — richard, Jan 23 '14 at 07:38
Can you merge this with the other one? I want to keep comments/answers from both. I didn't mean to have two of them. — richard, Jan 23 '14 at 07:40

Best practice or design patterns for retrieval of data for reporting and dashboards in a domain-rich application

0 Answers0