How to implement efficient heterogeneous microservice data queries?

Question

Our team has an idea of implementing a simple declarative DSL that would let users query the enterprise's domain model via a single interface without caring which specific microservices to call to get specific portions of data and how to then relate and combine them.

Suggested syntax is based on SQL, but:

Is much more limited: no grouping or aggregation, no explicit subqueries, no functions etc.
Joins cannot be specified and are only implicit based on the predefined schema (entities and relations).

Example:

SELECT entityTypeOne.name, entityTypeTwo.value, entityTypeTwo.date
 WHERE entityTypeOne.name LIKE 'Sample%'
   AND entityTypeTwo.date BETWEEN (2015-05-01, 2015-05-31)

Expected result:

╔════════╦═══════╦════════════╗
║  name  ║ value ║    date    ║
╠════════╬═══════╬════════════╣
║ London ║  1000 ║ 01/05/2015 ║
║ London ║  2000 ║ 02/05/2015 ║
║ London ║  3000 ║ 03/05/2015 ║
║ Moscow ║  2000 ║ 02/05/2015 ║
║ Moscow ║  9000 ║ 05/05/2015 ║
║ Tokyo  ║  1000 ║ 30/05/2015 ║
╚════════╩═══════╩════════════╝

The underlying entity-relation schema knows that entities are related like this: entityTypeOne.id = entityTypeTwo.parentId which creates an implicit join.

The "query engine" should know that it will first query the entityTypeTwo microservice applying the date range filtering on server, then entityTypeOne microservice applying the id filtering based on previous query's result.

The problems we currently see:

Representing the object-relation schema.
Figuring out the optimal order of querying.
Denormalizing resulting data.

I was wondering if this is a known problem and if there are any algorithms to check (maybe something from graph theory)?

This is the closest thing I could find so far:

What is a heterogeneous query?

If it makes things simpler we can assume that microservices are exposing data via OData.

I *believe* this is similar to what GraphQL was designed for. I don't have a lot of experience with it, but it might give you a starting point. https://facebook.github.io/react/blog/2015/05/01/graphql-introduction.html https://learngraphql.com/basics/introduction — Danny G, Jan 06 '16 at 19:37

score 2 · Answer 1 · answered Jan 06 '16 at 23:08

If what you are trying to do is present a single endpoint to many APIs, you might find some value in Netflix' Falcor project.

Falcor is not a query engine. It is a library for "efficient data fetching." It is one example of a growing set of tools delivering "Demand Driven Architectures" -- alternatives to traditional rest services that allow the author of a client tool to specify what they want in terms related to a canonical data model, thus obviating the need to develop UIs (the demand) in tandem with a backend. The "fetch" tools translate the canonical model into calls to individual rest services, and a combination of in-browser and reverse proxy caches make things efficient by avoiding subsequent calls to the data services for the same data.

To paraphrase Falcor lead author Jafar Husein: picture your service graph not as a bunch of discrete services, but as a single massive JSON Graph document. That is what users feel they are making requests of -- and Falcor handles the necessary caching, batching and routing that make it efficient.

It's almost as if these tools bring SELECT and WHERE clause behavior to a collection of REST APIs. And while that's not quite the same as building an efficient query API on top of REST, it may offer you the same benefits -- without your having to invent an efficient query processor, which frankly could take years.

How to implement efficient heterogeneous microservice data queries?

1 Answers1