How to do data synchronisation between two applications?

Question

Our client has a CRM application (let's call it A) and a custom built Windows application (VB.NET) (let's call it B) that directly writes to and reads from the CRM database.

Application A is used (mainly) to manage clients and their orders. Application B is used to manage all the products our client sells.

Our client wants us to build a webshop (C) using ASP.NET WebForms so they can sell the products they manage with B . All of their clients and orders should be present both in A and in C.

Since B will take care of all the communication to A, will have to come up with a way to let B and C synchronise their data.

This is something relatively new to us. We have done something like this in the past, but I really don't like the way it was solved there:

When someone orders something in the webshop, C puts the order in an XML-file, zips the XML-file and sends it to an FTP-server in the network of A. A monitors the directory on the FTP-server, unzips each new file, processes the XML, creates a new XML-file, zips it and sends it to an FTP-server in the network of C. C monitors the directory on the FTP-server, unzips each new file and processes the XML.

In addition to this, each night A generates an XML-file for each table in its database, zips them and sends them to the FTP-server in the network of C. C unzips the files and lets SSIS process the XML-files to update the database of C.

I really don't like that.

There must be other ways, right? We probably need single record sychronisation and batch synchronisation. How is this usually tackled? Webservices? Something custom like Rhino ETL? Something else?

Update: It would be nice if you could also give a short explanation of how the technology you propose is typically used and how it impacts the different applications.

The big "question" I have about this, is why do you need to maintain the data on your "webshop". "Store Fronts" as they are usually called do not store anything, as they are simply a database front end. — Darknight, May 19 '11 at 14:50
I don't know how we could do it without a separate database for the webshop. The webshop database will contain information about products that is not present in the other databases. The web- and databaseserver for the webshop are located in the U.S., the other database is running internally at our client in Belgium. Letting the webserver in the U.S. connect to the internal database in Belgium on each request is not feasible. — Kristof Claes, May 19 '11 at 17:06
Does the data need to be in both database in real-time or can there be a certain amount of lag? What database backends are involved? Are they onthe same server or servers that can be linked? — HLGEM, May 19 '11 at 18:41
Ok, that's a little clearer. In that case, it is highly likely that the products information is unlikely to change much (almost read only). The transaction will be mostly inserts (almost write only). What information is held at your clients? are they the products or transactions? How/who/where are the products updated? — Darknight, May 19 '11 at 22:48
@HLGEM - I don't know what database the CRM application (A) is using, but I think it's MS SQL Server 200X. The webshop will use MS SQL Server 2008. The first one is located in the LAN of our client in Belgium, the second will be in our datacenter in the U.S. Most of the data should be rather real-time. When they change a product, it has to visible on the shop. Each 30 minutes the stockdata needs to be updated to the website. — Kristof Claes, May 20 '11 at 06:03
@Darknight - Existing products won't change very often, but they will change. Not all products will change every day, but every day some products will change. The client holds information about the products (but not all information, they will annotated in the website) and the direct sales (offline sales). Customers on the webshop need to see both their online and offline sales in their account. Online sales need to be transfered to the client database as well for further processing. — Kristof Claes, May 20 '11 at 06:06
The here issue seems heterogeneous nature of the two database. MS SQL 2008 supports many different ways to synchronise between two databases. However as far as I am aware they both need to be at least MS 2005. Perhaps others can confirm this. My thoughts would be Master-Master replication, however the its unlikey that MS 2000 supports this? — Darknight, May 20 '11 at 10:56

Darknight · Accepted Answer · 2011-05-19T12:58:46.383

This is very bad, I've drawn a diagram to understand the situation better:

                                   (Foobar: XML/FTP/ZIP)
 ------<-----------> (A)<-|----------------------------------------> (C)
 | db |                   |
 ------<-----------> (B)  |

Its pretty clear that "Foobar" needs to somehow directly access the db. The two options are:

Bypass (A) and directly access the db
Application (A) give you the API (web service, RESTful, SOAP,etc) to the db

I believe the first option is unlikey (but talk to your clients).

This leaves option two, which means the burden is on your client to develop the API to allow you to access their db.

Of course you can maintain an "offline" transaction db which can used to store transactions when the connection is down between "Foobar" and then sync when its up again, but this is optional.

I hope that helps.

EDIT:

Propose this solution:

 ------<-----------> (A)  |
 | db |                   |
 ------<-----------> (B)  |  
    ^                     |         (WEB SERVICE)
    |______________> (W)<-|------------------------------ (C) (ASP.NET Store Front)

They will need to host the Web Services that exposes access to their db . This is the standard practice.

Now who builds the web services is for you and your client to decide.

Directly accessing the database of `A` is indeed not an option. Extending `A` so that it provides an API is also not an option, because `A` is not a custom built application. It is a mass marketed CRM application. One option might be to let our client adjust their application (`B`) to provide an API to us, but they've already told us they don't want to adjust their application too much. — Kristof Claes, May 19 '11 at 09:43

score 1 · Answer 2 · answered May 19 '11 at 09:05

1

Have you considered a service oriented architecture? I don't mean a big enterprisey thing. I mean creating RESTful services that allow you to encapsulate parts of your domain (customers, products, etc) from each other. In a SOA, there isn't synchronization, there's collaboration between properly decoupled services.

Sending zipped XML files by FTP would just be a humorously terrible thedailywtf-style anti-solution were it not (amazingly) considered reasonable by real people somewhere.

answered May 19 '11 at 09:05

Rein Henrichs

13,112
42
66

I wouldn't call that an anti-solution - it would've been quite feasible, like, 15-20 years ago. And I bet there are an awful lot of places where FTP transport is still in wide use (and will be for many years to come - if they don't need fast processing, security, session handling etc., why change what may have been working for decades?). E.g., practically all mobile operators. IMHO the fact that nowadays there are better solutions available doesn't make the old ways ridiculous per se - although it may make *the persons* who stick to them ridiculous :-) – Péter Török May 19 '11 at 09:20

Endy Tjahjono · Answer 3 · 2011-05-19T12:03:28.077

0

You can use Windows Communication Foundation (WCF). One point publishes modifications, the other points subscribe and receive notification:

The product manager (B) publishes product changes. The web shop (C) subscribes. B needs to be modified so it will trigger WCF during 'save'. C needs to be modified by adding event handler.
C publishes sales information. B subscribes.

If you need guaranteed delivery, you should explore Microsoft Message Queuing (MSMQ) as the data transport (WCF can use HTTP, MSMQ, etc for data transport, you can choose).

edited May 19 '11 at 12:03

answered May 19 '11 at 09:34

Endy Tjahjono

294
2
10

Thanks for the info, Endy. Is WCF suited to do batch processing? What if the applications need to synchronise a lot (several thousands) of records? For example for data that won't synchronised realtime but once every night? – Kristof Claes May 19 '11 at 12:07
It is possible: instead of during 'save', you can trigger the event at 03:00 for example, and send all modified products between yesterday 03:00 to today 03:00. You need to store last modification date if you haven't. – Endy Tjahjono May 19 '11 at 12:11
I figured that much :-) But would you use one WCF call (I don't know if that's the correct terminology?) to transmit all several thousand records at once, or would you use one call per record? Or maybe batch them up and make a few calls of few hundred records each? What would be the performance impact? – Kristof Claes May 19 '11 at 12:14
Performance is better answered with experiment :) I don't think one call per record is wise though. – Endy Tjahjono May 19 '11 at 12:18
You will run into a data ceiling pretty quick sending with your batches, so keep that in mind. The ceiling can be adjusted, but probably not to accomodate hundreds of records at a time (YMMV depending on the size of each record). – Morgan Herlocker May 19 '11 at 13:06
What has yet to be actually answered is why the OP **needs** to have the data stored in the "webshop" end. – Darknight May 19 '11 at 14:51

How to do data synchronisation between two applications?

3 Answers3