7

This is a classic problem which I'm sure has been solved many times by many different people. I don't have any formal training (I've not studied computer science or any other such academic subject) and so I'm not sure of the best way to solve the problem I'm about to describe.

If we imagine the below diagram is an example of a bankers dilemma (two users Foo and Bar have access to a single bank account: Baz). What is the expected behaviour when following one of the paths shown?

Note: I'm assuming we're using a mutex (or some other form of synchronisation) on the Baz variable.

Example 1: Baz initially holds the value 10. If Foo writes a new value (which is the result of removing 5 from the current value) before Bar; then Bar will end up taking 10 from the new value 5, leaving a minus balance (i.e. the final value will be -5). Meaning more money has been taken than available.

Example 2: Baz initially holds the value 10. If Bar writes a new value (which is the result of removing 10 from the current value) before Foo; then Foo will end up taking 5 from the new value 0, leaving a minus balance (i.e. the final value will be -5). Meaning more money has been taken than available.

Both actions (Foo (-5) and Bar (-10)) are triggered at the same time. So how do we ensure that either Foo or Bar is alerted to the fact that their transaction cannot be completed (as there are not enough funds for it to succeed)?

It seems a potential solution is to ensure the caller executes a method that uses a mutex internally to lock the value first; then once the value is locked we can read the value; and then check if the action is valid. If the condition passes then we update the value and release the lock on the value. Meaning the next caller will be able to lock the value down and run through the same steps.

But how would this approach work with a distributed system? You could suggest using a global data store, but it would have to be one that guarantees consistency (e.g. a service such as AWS' Dynamo DB offers "eventual consistency" and so wouldn't work for a banking institution); but guaranteed consistency is generally considered to be very slow (depending on the number of distributed nodes I assume).

So how do we attempt to solve this design problem?

Bankers Dilemma

Integralist
  • 171
  • 1
  • 4
  • Have you thought of how many nodes you really need to provide services to one single account Baz? – InformedA Oct 12 '14 at 17:08
  • Have a look at [Zookeeper](http://zookeeper.apache.org) – JensG Oct 12 '14 at 17:30
  • One thing that is important to keep in mind is the CAP theorem https://en.wikipedia.org/wiki/CAP_theorem , which roughly says you can't have a system that is consistent, available, and tolerant of partitions, but you can have any two of the three. You don't mention the need for partition tolerance. If you aren't worried about what happens when your system becomes partitioned, the problem can be solved by using locking. – Theodore Norvell Oct 12 '14 at 19:07
  • IMO, if you are going to use an explicit problem for your example then you should probably use explicit descriptive names and not use Foo/Bar/Baz. It makes the question easier to follow and I think because of that you'll get better answers. – Dunk Oct 13 '14 at 17:50
  • @Dunk understood. I'll remember that for next time. – Integralist Oct 14 '14 at 20:30

2 Answers2

4

I understand finance industry uses a system of "after-the-fact" checking and fixups to resolve errors.

ie, you make each transaction on the individual systems independently (such that you know that each system is correct) and you write to a log the details of each transaction. These logs are then compared later, and if an error occurred in one, the other is instructed to rollback its transaction.

So, Bank A successfully withdraws money, but bank B fails to credit it. Later on, the transaction lists from both are compared and Bank A gets a credit to make things right.

You cannot implement a distributed "lock" in such systems as they make not respond in the way you expect - and you do not want to lock someone's account while you make a withdrawal if you cannot tell how long it will be before the other system involved in the transaction will take to complete, you might end up with a lock that remains open blocking other transactions on that account.

gbjbaanb
  • 48,354
  • 6
  • 102
  • 172
  • I guess in a true banking system, this is why they don't have guarantee consistency. But how would this type of system work in a non-financial industry? There are applications developed that need to deal with similar problems of modifying data in a distributed/concurrent fashion. Although I used a banking example, the fundamental principles are what I'm interested in understanding. – Integralist Oct 13 '14 at 08:02
  • The same way - if you cannot lock all participants (and many times you cannot) then you need a reciprocating system such as this. Obviously it depends on your data, systems and access mechanisms. For example- move file, you can move the file to destination and then delete the original - but if you cannot delete the original after copying you can delete the destination. Many file systems do similar - changes go to cache, and a log. If your drive crashes before the cache is flushed the log is used to ensure file consistency by writing the changes or dropping them (eg compare FAT v NTFS). – gbjbaanb Oct 13 '14 at 08:12
  • @Integralist:Did you read up on "Transaction Processing"? Commit and Rollback are the key concepts that end the transaction. Other intermediate transaction states are "temporary" until they are committed or rolled back. Those intermediate states are dependent on what action(s) you are trying to perform. Bar withdraws 5, immediately withdraws 5 from Baz but that transaction might have a "pending" state assigned to it. That way if dispensing the 5 to Bar fails then the 5 could be rolled back and if Foo tries to withdraw 10 it'll find that there's only 5 remaining and reports a failure. – Dunk Oct 13 '14 at 18:14
3

For a distributed system, you would either:

a) Use "subtract amount or return error if you can't", where the code responsible for baz returns an error if the result would've been negative (or returns "success" if there wasn't an error)

b) Use the equivalent of locking; where the code responsible for baz has an "acquire baz" and "release baz" that need to be used before and after.

Note that this is typically just the tip of the iceberg. More likely is that you've got 2 or more bank accounts, and want to transfer funds from one to the others such that either all accounts are updated or none are updated. In this case you might (e.g.) end up with a combination.

For example, if there are two accounts "Fred" and "Jane" and you want to transfer $5 from Fred to Jane; then you might end up with a sequence like:

  • From you to Fred's account: "If Fred's account is 5 or greater lock Fred's account and tell me I can proceed, else tell me I can't proceed"

  • From Fred's account to you: "You may proceed"

  • From you to Jane's account: "If Jane's account can be increased by 5 lock Jane's account and tell me I can proceed, else tell me I can't proceed"

  • From Jane's account to you: "You may proceed"

  • From you to Fred's account: "Subtract 5 from Fred's account and release the lock you gave me previously"

  • From you to Jane's account: "Add 5 to Jane's account and release the lock you gave me previously"

Note that for this example; you, Fred's account and Jane's account may all be running on completely different computers communicating with messages/packets (with no shared memory at all).

Brendan
  • 3,895
  • 21
  • 21
  • "just the tip of the iceberg" -- it always amuses me when the beginers guide to java or whatever has a three line program to illustrate a bank transfer. In a real banking system this simple operation is implemented in several thousand lines of code. :-) – James Anderson Oct 13 '14 at 02:15
  • Thanks @Brendan for the feedback. In the breakdown of transferring money between accounts. What type of data store would we be considering here? Is this a single/global data point that we are accessing to retrieve the balance from each account? If the data store wasn't distributed then what happens if our global point of access fails? i.e. all users would be affected as apposed to when the data is distributed. – Integralist Oct 13 '14 at 07:36
  • @Integralist: The "banker's dilemma" is not about actual banking, in the same way that "dining philosophers" has nothing to do with dining or philosophers, and "baker's algorithm" isn't about baking. They are analogies used for reasoning about various concurrency issues and (like all analogies) not literal. – Brendan Oct 13 '14 at 14:02
  • @Integralist: For distributed systems there is no global data and no shared memory at the lowest level, and it's a bad idea (for performance and fault handling) for lower levels to emulate shared memory for higher levels. Typically each piece of data has an owner (where the owner is a thread or process on a specific computer); and where threads/processes on different computers communicate via. messaging (and not shared memory). – Brendan Oct 13 '14 at 14:11
  • @Integralist: If this wasn't a distributed system; the same approach ("shared nothing, with messaging") still works fine; but there may be completely different approaches that have less overhead (possibly including shared memory and locks, and transactional memory). Of course a solution for a specific problem depends on the specific problem. For "banker's dilemma" and nothing else you'd probably just use an atomic variable (and a lock-less "compare and swap" style thing). – Brendan Oct 13 '14 at 14:17