How best do you represent a bi-directional sync in a REST api?

Question

Assuming a system where there's a Web Application with a resource, and a reference to a remote application with another similar resource, how do you represent a bi-directional sync action which synchronizes the 'local' resource with the 'remote' resource?

Example:

I have an API that represents a todo list.

GET/POST/PUT/DELETE /todos/, etc.

That API can reference remote TODO services.

GET/POST/PUT/DELETE /todo_services/, etc.

I can manipulate todos from the remote service through my API as a proxy via

GET/POST/PUT/DELETE /todo_services/abc123/, etc.

I want the ability to do a bi-directional sync between a local set of todos and the remote set of TODOS.

In a rpc sort of way, one could do

POST /todo_services/abc123/sync/

But, in the "verbs are bad" idea, is there a better way to represent this action?

I think that a good API design is absolutely dependent on a very concrete understanding of *what you mean by sync.* The "sync" of two data sources is usually a very complex problem that is very easy to oversimplify but very difficult to think through in all of its implications. Make it a "bi-directional" sync, and suddenly the difficulty is much higher. Start by thinking through the very difficult questions that come up. — Adam Crossland, Feb 16 '12 at 17:06
Right - assume the sync algorithm is designed and functional in the "code-level" API - how do I expose this through REST. One way sync seems much more easy to express: I ```GET /todo/1/``` and ```POST``` it to ```/todo_services/abc123/``` But, the 2 way - I'm not taking a dataset and PUTting it to a resource, the action I'm taking actually results in the potential modification of two resources. I guess I could fall back on having "todo syncronizations" being resources themselves ```POST /todo_synchronizations/ {"todos":["/todo/1/","/todo_services/abc123/1"],"schedule":"now"}``` — Edward M Smith, Feb 16 '12 at 17:16
We still have a cart-before-the-horse issue. My point was that you can't assume the sync just works and design the API. The design of the API will be driven by numerous concerns of exactly how the sync algorithm works. — Adam Crossland, Feb 16 '12 at 17:20
That potentially exposes useful results: ```GET /todo_synchronizations/1``` => ```{"todos":["/todo/1/","/todo_services/abc123/1"],"schedule":"now","ran_at":"datetime","result":"success"}``` — Edward M Smith, Feb 16 '12 at 17:21
I agree with @Adam. Do you know how you are going to implement your sync? How are you handling changes? Do you simply have two sets of items that you want to reconcile or do you have a log of the actions that caused the two sets to diverge since the last sync? The reason I ask is it can be tricky to detect adds and deletes (regardless of REST). If you have an object server-side and don't have it client-side, you have to ask yourself, "Did the client delete it or did the server create it?" Only when you know precisely how the "resource" behaves can you accurately represent it in REST. — Raymond Saltrelli, Feb 16 '12 at 17:35
@RaySaltrelli, exactly what I'm talking about. Thanks for the elaboration. — Adam Crossland, Feb 16 '12 at 18:07
I understand what you're saying - there's no good answer in the event of the sync being a "black box" - any answer properly depends on the exact nature of the Sync. I'll dig into the actual sync routine and if its easily explainable, update my question. — Edward M Smith, Feb 16 '12 at 18:26

score 20 · Answer 1 · edited Jul 24 '16 at 20:32

Where and what are the resources?

REST is all about addressing resources in a stateless, discoverable manner. It does not have to be implemented over HTTP, nor does it have to rely on JSON or XML, although it is strongly recommended that a hypermedia data format is used (see the HATEOAS principle) since links and ids are desirable.

So, the question becomes: How does one think about synchronization in terms of resources?

What is bi-directional sync?**

Bi-directional sync is the process of updating the resources present on a graph of nodes so that, at the end of the process, all nodes have updated their resources in accordance with the rules governing those resources. Typically, this is understood to be that all nodes would have the latest version of the resources as present within the graph. In the simplest case the graph consists of two nodes: local and remote. Local initiates the sync.

So the key resource that needs to be addressed is a transaction log and, therefore, a sync process might look like this for the "items" collection under HTTP:

Step 1 - Local retrieves the transaction log

Local: GET /remotehost/items/transactions?earliest=2000-01-01T12:34:56.789Z

Remote: 200 OK with body containing transaction log containing fields similar to this.

itemId - a UUID to provide a shared primary key
updatedAt - timestamp to provide a co-ordinated point when the data was last updated (assuming that a revision history is not required)
fingerprint - a SHA1 hash of the contents of the data for rapid comparison if updateAt is a few seconds out
itemURI - a full URI to the item to allow retrieval later

Step 2 - Local compares the remote transaction log with its own

This is the application of the business rules of how to sync. Typically, the itemId will identify the local resource, then compare the fingerprint. If there is a difference then a comparison of updatedAt is made. If these are too close to call then a decision will need to be made to pull based on the other node (perhaps it is more important), or to push to the other node (this node is more important). If the remote resource is not present locally then a push entry is made (this contains the actual data for insert/update). Any local resources not present in the remote transaction log are assumed to be unchanged.

The pull requests are made against the remote node so that the data exists locally using the itemURI. They are not applied locally until later.

Step 3 - Push local sync transaction log to remote

Local: PUT /remotehost/items/transactions with body containing the local sync transaction log.

The remote node might process this synchronously (if it's small and quick) or asynchronously (think 202 ACCEPTED) if it's likely to incur a lot of overhead. Assuming a synchronous operation, then the outcome will be either 200 OK or 409 CONFLICT depending on the success or failure. In the case of a 409 CONFLICT, then the process has to be started again since there has been an optimistic locking failure at the remote node (someone changed the data during the sync). The remote updates are processed under their own application transaction.

Step 4 - Update locally

The data pulled in Step 2 is applied locally under an application transaction.

While the above is not perfect (there are several situations where local and remote may get into trouble and having remote pull data from local is probably more efficient than stuffing it into a big PUT) it does demonstrate how REST can be used during a bi-directional synchronization process.

This doesn't handle errors properly as the server is never notified of any errors on the client. Also step 4 can be done before step 3. `updateAt` relies on synchronized timestamps between the two sides when sequence numbers are safer. — user239558, Jan 29 '20 at 08:02

score 7 · Answer 2 · answered May 20 '12 at 07:11

7

I would consider a synchronization operation as a resource that can be accessed (GET) or created (POST). With that in mind, the API URL could be:

/todo_services/abc123/synchronization

(Calling it "synchronization", not "sync" to make it clear it's not a verb)

Then do:

POST /todo_services/abc123/synchronization

To initiate a synchronization. Since a synchronization operation is a resource, this call could potentially return an ID that can then be used to check the status of the operation:

GET /todo_services/abc123/synchronization?id=12345

answered May 20 '12 at 07:11

laurent

715
1
12
22

3

This simple answer is THE answer. Turn your verbs into nouns and move on... – Dave Mar 17 '15 at 12:46
1

A bi-directional sync is not an operation that happens on the server. You cannot 'check the status' of the operation like that. – user239558 Jan 29 '20 at 08:06

score 6 · Answer 3 · answered Feb 19 '12 at 03:37

This is a hard problem. I do not believe REST is an appropriate level to implement sync. A robust sync would essentially need to be a distributed transaction. REST is not the tool for that job.

(Assumption: by "sync" you are implying that either resource can change independently of the other at any time, and you want the ability to realign them without losing updates.)

You may want to consider making one the "master" and the other the "slave" so that you can confidently clobber the slave periodically with data from the master.

You may also wish to consider the Microsoft Sync Framework if you absolutely need to support independently changing data stores. This would not work through REST, but behind the scenes.

+1 for "hard problem". Bi-directional syncing is one of those things that you don't realize how hard it is until you're deep in the mud. — Dan Ray, Apr 19 '12 at 15:55

score 2 · Answer 4 · answered Jul 26 '13 at 16:01

Apache CouchDB is a database which is based on REST, HTTP, and JSON. Developers perform basic CRUD operations over HTTP. It also provides a replication mechanism which is peer-to-peer using only HTTP methods.

To provide this replication, CouchDB needs to have some CouchDB-specific conventions. None of these are opposed to REST. It provides each document (that is a REST resource within a database) with a revision number. This is part of the JSON representation of that document, but is also in the ETag HTTP header. Each database also has a sequence number which allows for tracking changes to the database as a whole.

For conflict resolution, they simply note that a document is conflicted and retain the conflicted versions, leaving it to the developers using the database to provide a conflict resolution algorithm.

You can either use CouchDB as your REST API, which will give you synchronization out of the box, or take a look at how it provides replication to provide a starting point for making your own algorithm.

I love CouchDB, and it's successor CouchBase + SyncGateway. +1 — Leonid Usov, Aug 24 '16 at 12:22

score -1 · Answer 5 · answered Mar 20 '12 at 08:53

You can solve the "verbs are bad" problem with a simple renaming - use "updates" instead of "sync".

The sync process is actually sending the a list of local updates made since the last sync, and receiving a list of updates made on the server in that same time.