10

The "canonical" idea is pervasive in software; patterns like Canonical Model, Canonical Schema, Canonical Data Model and so on, seem to come up again and again in development.

Like many developers, I've often followed, uncritically, the conventional wisdom that you need a canonical model, otherwise you'll face a combinatorial explosion of mappers and translators. Or at least, I used to do that until a couple of years ago when I first read the somewhat-infamous EF Vote of No Confidence:

The hypotheses that once supported the pursuit of canonical data models didn’t and couldn’t include factors that would be discovered once the idea was put into practice. We have found, through years of trial and error, that using separate models for each individual context in which a canonical data model might be used is the least complex approach, is the least costly approach, and the one that leads to greater maintainability and extensibility of the applications and endpoints using contextual models, and it’s an approach that doesn’t encourage the software entropy that canonical models do.

The essay presents no evidence of any kind to support its claims, but did make me question the CDM approach long enough to try the alternative, and the resulting software didn't explode, literally or figuratively. But that doesn't mean a whole lot in isolation; I could have just been lucky.

So I'm wondering, has any serious research been done into the practical, long-term effects of having a canonical model vs. contextual models in a software system or architecture?

Or, if it's too early to be asking that, then have any developers/architects written about personal experiences switching from a CDM to independent contextual models, or vice versa, and what the practical effects were on things like productivity, complexity, or reliability?

What about the differences at different levels, i.e. using the same model across a single application vs. using it across a system of applications or an entire enterprise?

(Facts only, please; war stories are welcome but no speculation.)

Aaronaught
  • 44,005
  • 10
  • 92
  • 126
  • I have no idea what they're talking about in the "Vote of No Confidence" article. The rest of the article makes sense, but the section about Canonicalism is so abstract that it doesn't mean anything. It would have been nice if they'd provided an example of what they were talking about. – Robert Harvey Apr 27 '11 at 23:57
  • @Robert: I have no idea if there's any truth to it but it's pretty clear to me what it means... surely you're familiar with the CM/CDM patterns? In its simplest incarnation, it's sharing a single monolithic EF (or Linq to SQL, or whatever) model/context across the entire application instead of the more (perceived) duct-tapey approach of writing specific queries for specific parts of the application. At a higher level, it's adopting a universal schema for an entire organization that all individual applications/systems must translate to. – Aaronaught Apr 28 '11 at 00:25
  • 1
    It sounds like the EF debate centers on table first vs class first design, as table-first design (where the classes are generated from the tables) suggests a monolithic model (although there are always specialized queries and views), while class-first design (where the tables are generated from the classes) suggests a more duct-tapey, flexible OO model. I'm a bit old-school, preferring the table-first approach, but I could see how the class-first approach would be attractive to some. – Robert Harvey Apr 28 '11 at 01:24
  • @Robert, it wasn't my intention to bring up the "EF debate", I just took a single quote out of that page because it was where I first heard the argument (and I'm not sure if I've heard it clearly expressed anywhere else). Separately, I'm not sure I agree that table-first design actually represents a monolithic model; the database *itself* is, but only the DBMS is truly aware of that - different parts of the application tend to only be aware of the specific tables and queries they depend on. – Aaronaught Apr 28 '11 at 13:56

1 Answers1

6

In response to the EF Vote of No Confidence article, Tim Mallalieu writes:

We are not recommending that folks return to the days where we were evangelizing the use of XSD for “canonical schemas”. I don’t believe that people think that this is tractable. What we do believe, however, is that it is desirable to have a single meta-model (EDM if you will) with which you can describe many domain models and that by having a single grammar we can provide a set of common services on any given domain model.

For example, consider an application that is to be written against a database with 600 tables. Do I believe that this app should have a single model with 600 Entity Types in it? No… Furthermore, do I believe that any given domain entity (say Customer) has only one shape in that app and that this shape must be the canonical shape for the entire Enterprise?… Heck no.

The Wikipedia article for Canonical Model references things like Enterprise Service Bus, Service-Oriented Architecture and CORBA, things which seem like they're hardly talked about anymore. They were all posed as the solution to the data proliferation and communication challenges of the enterprise, the One Ring to Rule Them All.TM Did they succeed? Or did they collapse under their own weight?


You asked for personal experiences, so I'll give you one. In the aerospace industry we use telemetry a lot. One of the challenges with telemetry systems is finding a way for different test ranges to communicate test data with each other in a meaningful way. That problem seems simple enough, until you attempt to define a data dictionary of common terms.

What does "altitude" mean? Is it the height above the ground, or is it the height above sea level? What if you're talking about a submarine? Then its depth, not altitude. To the Army, the word "transmission" has a different meaning when you are referring to a radar dish than it does to a ground-based vehicle. The wing surface that causes an aircraft to roll is called an "aileron" on some planes, and an "elevon" on others.

That's only a hint at the mountain of problems that follow. Although there are standards for data communications, every test range is different, and has different needs, goals and priorities. Standards can differ even among different projects on the same range. For this reason, test ranges understand that the solution will not come by replacing everything with a single, monolithic system, but by agreeing on simple communication protocols and providing ways to translate from one range's vocabulary to another.


The problems that large companies face are similar. Microsoft tends to think in monolithic terms, but that's because their company is, by and large, monolithic. As soon as you need to communicate between different companies with vastly different cultures and ways of doing business (or even between disparate departments in the same company), the One Ring to Rule Them All.TM immediately begins to break down.

Glorfindel
  • 3,137
  • 6
  • 25
  • 33
Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
  • It's interesting to know that the Microsoft designers themselves don't recommend the shared mega-model; unfortunately that's exactly how Linq to SQL and EF and many other ORMs tend to be used. Regardless, there are still a lot of folks out there promoting the Canonical Model concept and I'd still like to know how it fares in practice vs. the alternative. – Aaronaught Apr 28 '11 at 13:59
  • @Aaronaught: See my edit. – Robert Harvey Apr 28 '11 at 19:29
  • It's a good edit, and FYI I've upvoted it. I am, however, already aware of many of the specific incongruity issues that crop up when trying to canonicalize a model; I'm particularly interested in learning about experiences or long-term data about teams that have invested the time to resolve those incongruities, so as to have one mapper per endpoint (end-to-model), vs. investing the time in separate end-to-end mappers instead, and which approaches truly yielded the lowest cost / lowest complexity solutions. – Aaronaught Apr 29 '11 at 18:25
  • Or to put it more succinctly: I know that it's a lot of work to create and maintain a canonical model, what I want to know is when (if ever) the benefits have been observed to outweigh the cost. – Aaronaught Apr 29 '11 at 18:27