Where and what are the resources?
REST is all about addressing resources in a stateless, discoverable manner. It does not have to be implemented over HTTP, nor does it have to rely on JSON or XML, although it is strongly recommended that a hypermedia data format is used (see the HATEOAS principle) since links and ids are desirable.
So, the question becomes: How does one think about synchronization in terms of resources?
What is bi-directional sync?**
Bi-directional sync is the process of updating the resources present on a graph of nodes so that, at the end of the process, all nodes have updated their resources in accordance with the rules governing those resources. Typically, this is understood to be that all nodes would have the latest version of the resources as present within the graph. In the simplest case the graph consists of two nodes: local and remote. Local initiates the sync.
So the key resource that needs to be addressed is a transaction log and, therefore, a sync process might look like this for the "items" collection under HTTP:
Step 1 - Local retrieves the transaction log
Local: GET /remotehost/items/transactions?earliest=2000-01-01T12:34:56.789Z
Remote: 200 OK with body containing transaction log containing fields similar to this.
itemId
- a UUID to provide a shared primary key
updatedAt
- timestamp to provide a co-ordinated point when the data was last updated (assuming that a revision history is not required)
fingerprint
- a SHA1 hash of the contents of the data for rapid comparison if updateAt
is a few seconds out
itemURI
- a full URI to the item to allow retrieval later
Step 2 - Local compares the remote transaction log with its own
This is the application of the business rules of how to sync. Typically, the itemId
will identify the local resource, then compare the fingerprint. If there is a difference then a comparison of updatedAt
is made. If these are too close to call then a decision will need to be made to pull based on the other node (perhaps it is more important), or to push to the other node (this node is more important). If the remote resource is not present locally then a push entry is made (this contains the actual data for insert/update). Any local resources not present in the remote transaction log are assumed to be unchanged.
The pull requests are made against the remote node so that the data exists locally using the itemURI
. They are not applied locally until later.
Step 3 - Push local sync transaction log to remote
Local: PUT /remotehost/items/transactions
with body containing the local sync transaction log.
The remote node might process this synchronously (if it's small and quick) or asynchronously (think 202 ACCEPTED) if it's likely to incur a lot of overhead. Assuming a synchronous operation, then the outcome will be either 200 OK or 409 CONFLICT depending on the success or failure. In the case of a 409 CONFLICT, then the process has to be started again since there has been an optimistic locking failure at the remote node (someone changed the data during the sync). The remote updates are processed under their own application transaction.
Step 4 - Update locally
The data pulled in Step 2 is applied locally under an application transaction.
While the above is not perfect (there are several situations where local and remote may get into trouble and having remote pull data from local is probably more efficient than stuffing it into a big PUT) it does demonstrate how REST can be used during a bi-directional synchronization process.