7

A recent question mentioned the term canonical schema in a context of microservices architecture. After reading the Wikipedia article, as well as one of the answers to the question, I still don't understand what canonical schema is about. I get that it's a way to decouple the microservices by doing some magic with the data model, but I'm lost when it comes to the concrete application of the pattern. Other resources are talking about standardized information sets, which make things only more cryptic.

Imagine the microservice A is consuming messages from the microservice B through a message queue service. Let's say those JSON messages contain information about the availability of the products in a warehouse. While A doesn't have to know anything about the existence or location of B, I imagine that it still needs to know:

  • That the messages are formatted using JSON. If B suddenly starts to format messages in XML, I can hardly see how A would magically adapt itself, unless it was specifically programmed to deal with both JSON and XML messages.

  • The actual data model, limited to the part used by A. If A simply needs the product ID and the availability, A may not bother to know that the JSON message also contains the product full name, or the location within the warehouse. But it has to know that the product ID is stored in the field /product/id and formatted as a GUID, and that the quantity is stored in /quantity and formatted as a number. Again, if B switches to long-based IDs for the products, A won't be able to deal with it, unless the programmer had this potential format change in mind.

So, what this design pattern is about, and how is it used in practice? Given my example with the services A and B, what would happen if canonical schema pattern is applied?

Maybe it's all about A reading the schema of B and adapting dynamically to it? So it's exactly like Swagger, and also like reading WSDL on runtime and determining how a SOAP service should be called, is it?

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • `A is consuming messages from the microservice B` maybe the problem is here. No Ms should directly consume another MS due to the problem of coupling part of the solution to an external solution which doesn't have any control over. Plus turn it dependent. So are MS a sort of 'autistic system'? – Laiv Nov 16 '16 at 08:39

2 Answers2

6

None of the above. The article (and the post you link to) specifically say that MicroServices tend to not use a Canonical Schema.

With a Canonical Schema, there's no magic; the whole point of it is that within your SOA ecosystem you have a common model and format for a given 'thing'. It's like a contract. "Anytime we represent a User object, it will have the following schema: ".

Microservices, by contrast, tend to enforce their own data needs, and do whatever transformations they need internally, or when communicating with a different service themselves. Still no dynamic schema munging.

Paul
  • 3,277
  • 1
  • 17
  • 16
  • So the SOA's canonical schema consists of sharing a data model between the service and its consumer, such as, in .NET, the same data model assembly being used by both the service and the consumer projects. Microservices approach, on the other hand, consists of not sharing a data model, but, for the consumer, to rely on whatever part of the message it needs (for instance by walking manually through the JSON message). Is my understanding correct now? – Arseni Mourzenko Nov 16 '16 at 04:19
  • Yes, though the sharing isn't necessarily at the DLL level as you still might have a heterogeneous architecture from the app implementation perspective. One endpoint might be in Java while another in .NET, for example. So the shared model would be defined in a neutral language (e.g. XSD) and then each endpoint uses that as a mechanism for schema enforcement. – Paul Nov 16 '16 at 10:54
0

Use of Canonical schemas pre-date Microservices, and is a common practice in Enteprise Application Integration and Enterprise Service Buses, when integrating multiple systems, both within your organisation, but also with external partner systems or services.

The goals of adopting canonical schemas include:

  • Attempting to establish a common / best practice nommenclature and models for your enterprise data entities (this would have spanned integration, and analytic, including data warehousing). If possible, existing or de-facto industry standard formats should be preferred over re-inventing new proprietary schemas in your enterprise. (e.g. different systems in an retail enterprise may refer to a Product by as a sku, an item_id, Product_Id etc. Choose one preferred name, and model for Product, and use that throughout the enterprise)
  • To prevent system-specific naming (internal, and external partner system integration), typing and modelling opinions from 'bleeding' into your enterprise.
  • To prevent point to point system mapping complexity. As the number of systems increase in an enterprise, the complexity of mapping increases geometrically if each new system needs to integrate with different schemas to existing systems. With a canonical schema, each system only need map its internal representation to / from the canonical format for each of the message types that it uses, irrespective of the number of other systems in the enterprise. This is logically closely related to hub and spoke architecture.

Perhaps one of the reasons that 'explicit' canonical schemas are not as common in a modern enterprise with multiple bespoke systems (including microservice systems), where centralized design leadership and architectural roles exist, is that there's a much better chance that the system interfaces (including messages and APIs) will be 'canonical' to your enterprise from the outset, and as a result message and API integration between your microservices will share a common naming, typing, and entity definition (although an anti corruption layer in each system is still a good idea for isolation and future proofing).

StuartLC
  • 1,246
  • 8
  • 14