62

Quite often in my work the idea of 2-way data synchronisation between database systems crops up. The classic example is two slightly different CRM systems (say, Raiser's Edge and Salesforce) and the need to have a two-way sync of Contact data between them.

API considerations aside, assuming you have a shared key to sync on, and purely thinking of the algorithm/pattern to be employed, this is a task that is often underestimated by non-techies.

For example, you have to watch out for:

  • Can you easily detect which records have changed in both systems (or will you have to compare all records between the systems to detect changes)
  • If you're going for a once-every-N-hours sync, how to deal with conflicts where the same record changes at more-or-less the same time in both systems
  • If you're going for a real-time sync (ie an update in one system immediately triggers an update to the other system) how to handle divergence over time due to bugs or system crashes

Personally I can think of ways to tackle all this but I'm wondering if there are any well known patterns, literature or best practices that I could refer to.

codeulike
  • 801
  • 1
  • 7
  • 10
  • what you describe sounds pretty close to [Federated database system](http://en.wikipedia.org/wiki/Federated_database_system) - is that correct? – gnat Feb 17 '12 at 08:03
  • @gnat: Thanks for the link, some of the concerns are similar (eg dealing with heterogeneity), but I'm talking about syncing a subset of data from two autonomous databases whereas that seems to be more about creating a fully integrated view of everything across multiple dbs. – codeulike Feb 20 '12 at 11:48
  • 5
    7 years later, 50 upvotes but only 1 decent answer. There must be some syncronisation patterns or best practices out there? – codeulike Feb 14 '19 at 19:56
  • Seems to me that, in addition to the shared key, you need at least a *last time updated* stamp on each system’s records. That or you could hash the shared fields and compare hashes. It is a pretty low catch for a good Q (at least if MS’s stack cant be applied). – JL Peyret Jun 05 '23 at 01:43

2 Answers2

9

Yes, a hard problem, easily underestimated. And could be a lot of work. If you are on Microsoft technologies, you may want to have a look at Microsoft Sync Framework here and here.

codingoutloud
  • 716
  • 5
  • 8
  • 1
    Thanks, thats interesting. I'd heard of Ms Sync Framework but hadn't realised it was so generalised. It basically is a pattern for handling sync issues in general. – codeulike Feb 20 '12 at 11:35
  • 2
    Microsoft Sync Framework was replaced by Microsoft Sync Framework Toolkit. – Tomas Kubes Dec 09 '14 at 20:56
  • I am frustrated with the docs, which not that clear, specially for Non SQL-Server ADO.NET data providers, which is my case. Besides, my workplace is looking for something that does not require adding infrastructure tables/making changes in the Production environment. So I am about to discard this one. – Veverke May 14 '17 at 13:24
0

There are many theories about remote site DB synchronization. First start with INSERT. handling this one is easy - as you can create a unique ID for every site (for example an initial of the site name + ID (number): site_a_177 vs. site_b_53)

So insert should not create any conflicts. the problem is the update. I don't believe that there's a 100% failure proof method, but you can start an update by "locking" the record in the remote DB, and only after you got the handle - continue with the update, and finish by syncing the update and only then release the lock.

Nir Alfasi
  • 193
  • 2
  • 8
  • 1
    Thanks, I think you're talking about distributed dbs with the same schema and dealing with distributed transactions. I'm thinking more of scenarios where the two DBs are completely autonomous (e.g. they assign unique ids in completely different ways and the schemas differ) but you want to sync a subset of the data in them. – codeulike Feb 20 '12 at 11:51
  • It sounds like there shouldn't be any conflicts. In that case it should be very simple - just save the "last record-id" that was synced for each table and continue from there. – Nir Alfasi Feb 20 '12 at 16:19