3

Scenario:
An application is going to perform operations on an entity. That means retrieving it from storage, possibly making modifications, and then persisting those changes back to storage.

I'm hearing conflicting approaches.

  1. We should retrieve an entire entity from storage (like a repository), our application logic should modify it (setting its properties or calling its methods, whichever) and then save the entire entity.
  2. We should only query the properties we need to execute application logic. When we save changes, we should have storage methods for saving only the modified properties. This will allow concurrent processes to update different properties without blocking each other.

This assumes a predominantly anemic domain model. It's more likely that application (domain) logic will read and set properties on entities.

I realize that no two scenarios are the same. What factors would you consider when deciding which approach to take?


I can't describe the actual scenario. Imagine building a network of computer technicians who work both on-site and remotely, determining who is available to work where and when and what their skills are, and deciding where and when you will assign them to work to match the expected need to schedule appointments. There are approval workflows involved for scheduling.

The result of errors might be things like scheduling someone for appointments in one area when they aren't going to be there, scheduling too many technicians in one area where there's not enough need, scheduling an on-site session for someone who works remotely, or scheduling an appointment with a technician who doesn't have the right skills.

A technician might say they're available but need to change it. Or their skills might change, affecting the types of appointments they can do.

Scott Hannen
  • 989
  • 1
  • 6
  • 13
  • What's more critical in your system? Data consistency (1) or performance (2)? – Laiv May 20 '23 at 07:21
  • Data integrity. It could grow a great deal and I wouldn't consider it to be a high-traffic application. It's not at the point where we're separating read/write databases and that's unlikely to be a need for years if ever. – Scott Hannen May 20 '23 at 13:13
  • Do you expect high concurrency over data subsets? Say, multiple requests updating stocks or multiple users working on the same "order"? Is `concurrent processes to update different properties without blocking each other` allowed or possible? Do you have any concurrency control policy for these cases? – Laiv May 20 '23 at 14:07
  • Are concurrent processes to update different properties allowed without blocking each other? That's actually part of the question. Should that be allowed? (I'm trying hard not to say what I already think.) I realize that lots of specifics are missing. I'll update the question with a vaguely similar scenario. – Scott Hannen May 20 '23 at 16:24
  • If the objects are or can be (perhaps after some rethinking) modeled as value objects (that is, the ID is not what's important about them), and if completely replacing them makes the logic easier to implement, understand and maintain, then save/replace the entire domain object (or the entire set of domain objects). – Filip Milovanović May 20 '23 at 17:38
  • You're right - that was self-contradicting. I removed that question leaving only "what factors would you consider?" – Scott Hannen May 21 '23 at 13:30
  • The topic of the question seems to be a bit far removed from the specific scenario here. There are multiple ways to tackle high concurrency processes, some of which use A and some of which use B. A/B are not the sole deciding factor here. You need to consider the overall architecture. The architecture should not be decided purely in pursuit of the A/B decision, as the choice of architecture will have significantly further reaching impact. You need a bigger picture here before you can make any reasonable decision about this problem. – Flater May 22 '23 at 02:27
  • Although the approaches are completely different, it's not difficult to switch between them or even do both at once. I'm not advocating that. But it's a decision that can be implemented and changed in a small scope, even though the effects may be far-reaching. (I should add that I have an opinion on the answer to my question. I'm leaving it out because I don't want to influence any answer.) – Scott Hannen May 22 '23 at 14:03

2 Answers2

4

The standard strategy I would recommend when approaching such problems is

  • start with the one approach which is most simple to implement in your current environment / context

  • when you run into issues (like performance or concurrency issues), adapt as needed

In a typical business application where you think of domain objects, repositories, and where you may have some ORM at hand, loading and storing full entities is often the most simple to implement approach. If that's your case, start with that.

For other situations, maybe when you are implementing a new use case on an existing system, where you cannot reuse any any of the existing code for some reason, and that use case makes only use of a certain subset of attributes, it can be simpler to query and update only the required attributes.

Doc Brown
  • 199,015
  • 33
  • 367
  • 565
  • 1
    If you use CQRS to strategically optimize a few of your worst performing read queries, it is likely going to be long time before the extra field reads (during a write) have a significant impact of overall system performance. – DavidT May 21 '23 at 10:28
1
  1. We should only query the properties we need to execute application logic. When we save changes, we should have storage methods for saving only the modified properties. This will allow concurrent processes to update different properties without blocking each other.

This approach is flawed in many scenarios. For example say one process wants to update an order quantity and other the order item? You want ten apples but end up with ten oranges.

This forces you to add a "poor man's transaction" where you UI is always checking to see if anything has changed which you were editing and prompting the user to check for consistency.

Additionally, only getting the properties you "need" can exponentially increase the amount of code you need to write and edit when new requirements come along. If i was previously on getting the order item but now the requirement is to make it green when the status is InStock I need to go back though and change every layer. I probably even need to keep the old JustGetTheItemPartsOfTheOrder() method in case its used elsewhere. Doubling up on my code.

Its not always wrong obviously, objects can be too big to make uploading the whole thing everytime sensible, or you may have a small change on many objects "cancel all the order of apples! we ran out!", or you may want to save the change that was made rather than the entire object, "edited line 7 of the novel to read xxx"

My general guidance would be "write your application as if it hasn't got a datalayer" This forces you to think about you application in terms of code and objects rather than an interface for a database.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • The first paragraph isn't wrong but it would be similarly wrong to have either ten apples or one orange for both consumers - one of them will always overwrite the other one's work. The presumption of OP's suggestion is that one consumer only ever cares about the quantity and the other consumer only ever cares about the type of fruit. By extension, the former doesn't care if the type of fruit changes, and the latter doesn't care that the quantity changes. When (and **if**) this is the case, OP's suggestion is not wrong in the way this first paragraph points out. – Flater May 22 '23 at 02:31
  • 1
    disagree. the presumption is that you have a form with two buttons, update quality, update item vs one button update order. one button guarantees the order is what you think it is and not an amalgamation of individual changes. – Ewan May 22 '23 at 12:15