Handling retroactively adjusted data that is reported hierarchically

Question

Several of our customers have come to us with an interesting problem that involves adjusting data that occurred in the past which is rolled up and reported based on an org hierarchy.

For example: If you run a report by "Sales Team" in March it will roll up sales closed by members currently assigned to that team. Then some team members switch teams at the start of April. Then a few days after some data for March is adjusted due to business-specific reasons (returns, audits, etc.) If someone reruns the same team-based report they will get drastically different results now besides the data adjustments because team members have moved around.

We currently have our own solution to this issue which involves a significant amount of denormalization and updating "effective" time stamps whenever someone moves around in the hierarchy. I'd love learn about how others have solved this but I've struggled to even find examples of other CRMs or reporting platforms that even attempt to support this issue.

What software have you see that supports this? Is this super uncommon and not worth the time to solve for most businesses? Or is this issue a symptom of a deeper problem?

I don't have enough experience with the following to feel comfortable posting ot as an answer, but what you're looking for is a point-in-time architecture. A database model storing the current and paste state of everything. — Sjoerd Job Postmus, Mar 20 '16 at 18:32
@Sjoerd Job Postmus Thanks for the term. That will be helpful to search for. Have you seen any existing CRM/SaaS systems support this in their products? — Vyrotek, Mar 20 '16 at 18:33

Doc Brown · Answer 1 · 2016-03-21T06:55:47.023

When you have time-dependent reports, which depend significantly on other time-dependent data like "when did which a team member belong to a team", you need to track the history of all relevant records if you want to get the reports right. And, of course, when you run the report, you have to tell the system for which date you want the report, so if you run the report on April the 10th, but want a report still for 31/03, you will get exactly that.

So in your example, you could have three tables "Person", "Team", and "Membership", where membership needs to contain date range attributes. In this approach, there is no denormalization involved. With this model, it should be straightforward how to include the given date for which the report is requested into the query.

It seems you believe this has something to do with the organizational hierarchy model - that is probably just a misconception, because that part of your model currently does not track history correctly. The same problem would appear with any other kind of data, if proper date/time stamp fields were missing.

For some kind of systems it is ok to avoid the explicit modeling of dates, because every month or quarter you make a full freeze of the data, set up a new copy of the database, and run reports against the freezed version of the db. But in general this approach is often not suitable, since you need finer granularity or do want not take the burden of dealing with these freezed database copies.

If you are in a situation where you cannot easily change the core data model (maybe your CRM is a third party product, as you mentioned), instead of freezing the db at a whole, it may be more suitable to make an automatic snapshot of all relevant data in your database to some extra tables at the end of each month and run your reports against this snapshot. These tables could be part of your "enterprise db", maybe in a separate schema "per month", or they could reside in a special, lightweight "reporting database". The controlling parameters which might be changed afterwards for "business reasons" should be made part of the snapshot, too, but you have to offer a feature to adjust them afterwards without changing the "core data".

Separating the reporting from the "standard OLTP processing" can have some additional benefits regarding performance and availability, especially if this is a "big" system.

It sounds like we took the right approach. With the right timestamps it can all be calculated. Our requirements are a bit more complicated which is why I mentioned denormalization. We actually have to support N team deep. Those sub-teams move around to to other teams too. This has a ripple affect on everyone below. Trying to calculate a single individual's hierarchy is expensive and so we flatten the whole thing. Im surprised by the lack of support from existing CRM/SaaS software. I suppose companies who need this end up writing their own reporting? I just don't want to reinvent the wheel. — Vyrotek, Mar 20 '16 at 18:09
@Vyrotek: I've worked for a couple of companies that just generate raw figures (aggregates) by department by quarter, then those figures go into spreadsheets maintained by the departments where they keep track of individual assignments. Messy, labor-intensive, and error-prone? Yes, yes and yes. However, it's very flexible and permits departments to "run things their way". — TMN, Mar 22 '16 at 18:32

score 2 · Answer 2 · answered Mar 20 '16 at 19:33

One solution to this is to report off 'events' sometimes called domain events.

Rather than selecting from a relational database and having to ensure that it correctly models history, write your sales events to a flat table as they happen and run the report off that table.

Then if the events need correcting, say they were recorded against team X when really they should have been assigned to team Y you can easily edit the event data and re run the report.

You can obviously generate the data after the fact rather than recording it as it happens, although this can lead to choke points when you need to generate large amounts of data.

score 1 · Answer 3 · answered Mar 22 '16 at 18:08

Look into 'Event Sourcing Pattern/Architecture'. Rather than storing object states, you are recording a stream of events, which can be used to re-create the object state.

You could now inject your custom event of a member changing teams after the fact, and regenerate your objects based on this modified stream of events.

Using snapshots can help with performance when you are dealing with a lot of events as well.

Handling retroactively adjusted data that is reported hierarchically

3 Answers3