3

How do I ensure consistency between two separate systems which store and maintain the same set of data, when the method of communication between them is REST APIs and webhooks? Is it possible?

I find myself coming across this problem when working with APIs like MailChimp or eBay, where the third party system is acting as a complimentary system to an internal CRM or an online shop. (Although I'm mentioning specific services here, what I'm getting at is a general way of achieving consistency using APIs and webhooks, which is why I think the question belongs here.)

Example

Let's say I call my application MyApp, and the third party application ThirdApp. Both store information on my clients. A unique ID, an email address, and some preferences.

When a client is updated on MyApp, I can make an API call to ThirdApp to inform them of the changes, so they can update their version of the data.

Similarly, if someone makes a change to a client entity on ThirdApp, I can have a webhook set up so that ThirdApp can inform MyApp of the change.

I don't want to be waiting for API calls to complete each time I update a client on MyApp - if ThirdApp's API stops responding, then MyApp would be unable to make changes, which is undesirable.

Similarly, ThirdApp just assumes MyApp is okay with any changes it applies, anyway. It doesn't wait to commit it's changes - the webhook request it sends to advise MyApp of the change is really just a courtesy message.

Is it possible to ensure (perhaps eventual?) consistency in this situation?

Maybe a change queuing system?

I've thought about using some sort of intermediary queuing system on the MyApp side, with incoming and outgoing changes, but I always seem to be able to think of ways things might end up inconsistent. For example, imagine if client 123 has an address of john@hancock.com, and the pending queue looks something like this (none of the queue entries have been processed yet):

  1. OUTGOING MyApp says client ID 123's email address has changed to joe@example.com
  2. ...
  3. INCOMING ThirdApp says client ID 123's email address has changed to jane@example.com

Whether the queue has been processed or not, ThirdApp is correct in storing jane@example.com as the client's email address. However, as we sequentially process the queue:

  1. The outgoing change sends an API call to ThirdApp, "the new email address for 123 is joe@example.com.
  2. ... some other stuff happens, during which time both email addresses are wrong...
  3. The incoming change means MyApp updates it's 123 entity so that the email address is jane@example.com. MyApp is now correct, but ThirdApp is wrong.

Possibly to solve that I could implement some sort of look-ahead logic in the queue processor, which will try and determine what the "correct" action should be, based on what other subsequent entries affect the given record further down the queue. This seems like it could be very complicated though.

Maybe a third, single source of truth?

Maybe the solution is to have a third copy of the information, managed by a third system, which both MyApp and ThirdApp send update information to, and receives updates from. So any change, whether from MyApp or ThirdApp, is an incoming change there, and outgoing API calls are made to update other keepers of the data accordingly.

At this point I'm starting to really struggle to keep track of consistency in my head. Also, I'm sceptical that adding another copy of the data into the mix will solve the problem - it seems likely that it would make it more complex, and therefore worse.

I also find myself wondering whether I can trust a webhook request from ThirdApp to be timely or not. What if the change was made 20 minutes ago, but the webhook request only just got sent? Changes within the last 20 minutes therefore should supersede them.

Alex
  • 159
  • 5
  • 2
    The queue is fine. The problem you describe is only temporary. If your logic works the queue for all third parties in the original order, you will get eventual consistency. A look-ahead might save a transaction or two but is probably not worth the extra complexity, unless you expect users to be constantly changing their email address. – John Wu Jul 31 '18 at 02:38

3 Answers3

0

Just a first impression here, I think that adding an additional layer of complexity is a bad idea.

Possibly, Adding a validation on your end that double checks the other location within a certain timeframe (after it processes a queued request) to confirm that you both have the same info 5 minutes after a change would probably be a good stopgap if you are really worried about multiple changes.

Or, making changes based on a timestamp. If a timestamp is earlier than the most recent change reject the change and reassert to ThirdApp that your version is more important.

IT Alex
  • 101
  • 3
0

Here you can treat MyApp as master or source of truth for third party app. Using REST api to communicate is fine. You will achieve eventual consistency. Main problem is to ensure eventual consistency among this leader and replica. You can use timestamp or unique ID to find the lag between updation on replica.

You can use sync or async communication with third-party basis on your requirements. You can use sync for up-to-date data state at MyApp and third party app, but then you have to wait for confirmation. Most of the systems work around eventual consistency.

0

What is missing from your setup is Lamport clocks. Leslie Lamport understood that in a distributed system everything is relative, including the notions of time and causality. That is, MyApp and ThirdApp can perform incompatible updates to the same item at the same NTP timestamp, and we won't know which one should win.

But the client does. Presumably your client C arranged for MyApp to use "joe" and for ThirdApp to use "jane", and those updates are racy. How shall we decide which one wins? It is ambiguous, we lack an arbiter function. Why is that, how shall we fix it? We sent too few parameters, and we should instead ask C to send a (value, timestamp) tuple when communicating with either of the APIs. A Lamport timestamp could be a simple serial number, or it could resemble a time-of-day counter.

Now a single host, C, is dictating the "happens before" relationships. It is trivial for either API to order updates based on C's timestamp, and arrange for "last one wins!" when persisting a result. Under these rules, the client will never be surprised, will never observe a "happens before" inversion.

J_H
  • 2,739
  • 11
  • 19