6

I'm currently assigned a task, where I have to program a bidirectional synchronisation between 2 ERP-related software, one providing webservices (Visual Studio Web Reference) and the other one providing a REST API.

I managed to match the fields of the APIs together, so I could potentially start creating objects in both APIs, using the same data. However, I reached the point of the actual sync, and it's harder than I thought.

In my head, everything goes fine.

Grabbing the data -> determine, what change it is on which side -> sync the data from the source system to the target system

But in actual coding, I'm encountering some serious problems:

  • Both APIs don't have any way to tell, when data was modified the last time, making it hard to determine, what kind of change it is (I thought about having a list of the previous sync to compare, but that leads to a different problem)
  • There's a priority on the system with the REST API, so if there are 2 changes at the same time, the change in the REST API would count
  • In which order should they be sync'ed? If there's a focus on the REST API, then the REST API would need to apply their changes after the webservice one, right?
  • The sync software would run on a server, at the moment in a permanent loop of getting the data, comparing and applying the changes. I'd love to only sync the changed data, but that wouldn't be possible, regarding the problem, that there's no way for me to determine, what change occured (something was created, something was changed, something was deleted)

Can you give me advice on how to accomplish this, in a reliable and resource-friendly way?

The webservice API is a lot slower than the REST API, meaning there's an interval, where nothing gets sync'ed (it may sound stupid, but I thought I could solve this problem single-threaded). Could I potentially need multi-threading here? The webservice API is rate limited to 10000/hr, dunno if this could get a problem.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • I would start by asking the person who assigned you the task if it is OK to let the last change win. There's really no such thing as "2 changes at the same time," and if there is, simply apply the REST change last. – Robert Harvey Aug 17 '16 at 13:50
  • I'll update my post accordingly: The webservice API is a lot slower, which means, that there's an interval of not syncing. In this interval, there could be changes. – Jean Luc Nürrenberg Aug 17 '16 at 13:51
  • Do these changes really have to happen in real-time, or merely whenever the system gets around to making them? – Robert Harvey Aug 17 '16 at 13:55
  • No, they don't. They could also be applied like each few hours. – Jean Luc Nürrenberg Aug 17 '16 at 13:56
  • Can you simply delay the the non-REST API a bit? – Robert Harvey Aug 17 '16 at 13:58
  • Both APIs return the same amount of objects (823, to be exact) and the REST API returns an answer, containing all objects in around a second, while the webservice needs around 26 seconds, which is incredibly long. – Jean Luc Nürrenberg Aug 17 '16 at 13:59
  • Seems to me like the webservice API should call the REST API to get its result, instead of whatever it is doing now. You'd get an enormous speed improvement, and your sync problems would essentially evaporate. – Robert Harvey Aug 17 '16 at 14:02
  • @RobertHarvey If I understand the problem correctly, he doesn't have control over either interface and changes can be made to either system at any time. – JimmyJames Aug 17 '16 at 14:04
  • Then it can't be done. You have to live with whatever characteristics the interfaces already provide. – Robert Harvey Aug 17 '16 at 14:05
  • @JimmyJames is right, I don't have control over either interface here. Both are provided by 2 different companies and work differently. – Jean Luc Nürrenberg Aug 17 '16 at 14:05
  • That means you have to get someone to sign off on the premise that the timing differences aren't going to matter. Unless the information you need to evaluate timings is *already present in the interfaces,* your task is impossible. – Robert Harvey Aug 17 '16 at 14:06
  • Timing doesn't matter, they should just get synchronised *at some point* (like every few hours). Sure, it'd be great if it was real-time, but if it doesn't work, I can't do anything about it. – Jean Luc Nürrenberg Aug 17 '16 at 14:09
  • 1
    @JeanLucNürrenberg You've only give us a keyhole to look through but based on what you've provided, I think the larger design is untenable. If you have two systems that are both primary sources for the same records, and either can be updated at either time, there's no real way to know which system has the correct current data. To really solve this issue, you need to address how the changes are being made. For example, you would build a service that takes the changes and applies them to both systems concurrently. – JimmyJames Aug 17 '16 at 14:21
  • I just got told by one of the developers of the REST API, that they implemented the last modified field. Now, it's only the webservice that doesn't have this. How would I go about checking which is newer now? Is it still as impossible as before, since I can only check it one-sided? – Jean Luc Nürrenberg Aug 17 '16 at 14:49
  • @JeanLucNürrenberg You are better off than before but it's still not clear to me why it's OK to overwrite changes in one system with changes from another. If you have a conflict, how do you know it's OK to overwrite the data in the one that was say, 5 minutes older? Think about working with other people using source control. If you and your co-worker both update the same source file and you check-in first, would you be OK with your team-member destructively overwriting your changes? – JimmyJames Aug 17 '16 at 16:59
  • I talked about that with my boss, and he said, the last change applied should matter. I know, it isn't optimal, but there's no way of conflict resolution at this point. – Jean Luc Nürrenberg Aug 18 '16 at 10:03
  • @JeanLucNürrenberg Then you still need the last update from the other API having it from one doesn't really help much because you have nothing to compare it to. You are stuck with looking at all the records to see what changed. You can use the last changed date on the REST API instead of hashing or capturing the result but you basically are left with applying whatever has changed since the last time. – JimmyJames Aug 18 '16 at 13:07
  • Yeah, I see. We'll do a oneway synchronisation first, and the figure something out for the bidirectional method. – Jean Luc Nürrenberg Aug 19 '16 at 06:16
  • @JeanLucNürrenberg I know this been awhile but for the system that doesn't have a record level last modified timestamp could you put a proxy in front of it? Essentially if their security model sucks you might be able to intercept change requests, log what is changing and what time it is changing, and then use that to help with synchronization. – Adrian Dec 15 '16 at 19:41

2 Answers2

5

Welcome to the world of integration. The hardest thing about it is how poorly most vendors understand what you need and the resulting gaps in their interfaces.

In this case you seem to have a really intractable problem:

  1. You need to know precisely when changes were made
  2. There is no way to tell exactly when changes were made

No amount of design in your integration will fix this flaw in the API. You either have to get the vendors to address this issue (last update time, among other things should be included, there's no justification for not providing it) or you need to come up with a different approach to meet your requirements.

I think what you might be trying to do is on an interval, look at the two systems and apply any changes on one to the other. If both systems have changed in the interval you want to prefer on over the other. This is possible but you will probably lose changes made on the non-preferred system if your interval is fairly long. The way you can do this based on what you have provided is to look at all the records and compare them to some sort of record that you keep to determine if they have changed. For example you can keep a table of the record ids and a hash of the record. When the hash changes, you know there was a change made since the last time you checked. Then you can apply the update. This is not efficient and you can have a dirty read issue.

The problem you are trying to solve is essentially that of eventual consistency. Bitcoin, for example has a pretty interesting way of addressing it. I think if you read up on how it is accomplished, you'll find that you lack the requisite data elements to make it work.

JimmyJames
  • 24,682
  • 2
  • 50
  • 92
  • I can't remember the last time I worked in a system where this sort of thing mattered. Gaming systems, maybe? – Robert Harvey Aug 17 '16 at 13:56
  • @RobertHarvey So if you deposited money into an account and it disappeared because of some timing issue around synchronization, that wouldn't matter to you? – JimmyJames Aug 17 '16 at 14:02
  • That's a transactional problem, not a timing problem. You know that. – Robert Harvey Aug 17 '16 at 14:03
  • 1
    @RobertHarvey I think your are missing the problem here. There are two systems that can have changes and changes in one system need to be reflected in the other. It's peer-to-peer, not client-server. – JimmyJames Aug 17 '16 at 14:06
  • What makes you think it is peer-to-peer? Neither web services nor REST services are generally peer-to-peer. – Robert Harvey Aug 17 '16 at 14:08
  • 1
    @RobertHarvey based on the description of the problem he is running into. It's a familiar scenario. – JimmyJames Aug 17 '16 at 14:15
1

You can compare the data from both sites, but among others, I think you'll run into the reappearing deletion problem.

I think you'll need to maintain snapshots for comparison of one system with itself to find later what has changed within each system alone, then settle those changes with the other side while also updating the snapshot of each.

Using that kind of snapshot, you may miss some intermediate updates (e.g. same record updated twice), though I'm guessing that's probably ok.

However, I'd make sure the snapshots are atomically obtained, so the snapshot is fully correct for a single point in time.

Erik Eidt
  • 33,282
  • 5
  • 57
  • 91