I'd follow these principles:
- Keep (meta)data format expandable; do not force expansion.
- Allow verification and sanity checks.
- Allow and encourage comments.
- Always version the data.
Point by point, from last (easiest) to first (hardest).
Always version the data
Your metadata representation will evolve. Version each such change (e.g. using semver). This prevents the problems of guessing what format the data is really in, and how to correctly interpret it when different versions allow for different interpretations.
Your implementation(s) should be ready to interpret a range of versions, and handle older versions' data which are inevitably less complete and detailed.
If possible, allow (but not enforce) version info in each node of the metadata document. This allows for gradual migration of parts that need the new version's features without a forced migration of the rest.
This is problematic if you don't have a single versioning authority, and several independent users create diverging versions of metadata formats. This can also be solved, but for simplicity it's best avoided.
Allow and encourage comments
Let people add free-form context to the formal representation. Loss of such context leads to second-guessing and horrible mistakes as the data are edited and/or migrated to newer representations.
Encourage adding comments in your metadata-editing tool. Restrict the amount and format of the comments as little as possible.
JSON is a notoriously bad config format because it lacks comments; if it is used, allow an extra "comment"
field in any object.
Note that comments should be normally accessible as a part of the metadata, so that any tools that evolve would easily see and update them.
Allow verification and sanity checks
As much as your problem domain allows, add an automatic tool to check the metadata and detect common problems. Check for inconsistent values, for inconsistent names, spelling errors (a particularly nasty problem in stringly-typed data which most metadata are). If you have some formal rules (a kind of schema), check for its violations (e.g. number format and ranges, date format, empty values, any references to other nodes, etc).
Most likely you will have to allow some violations to stay, until a better way to express something is implemented. But you want to make them easily visible.
Keep (meta)data format expandable; do not force expansion
Take a format which is easy to expand and understand. Most likely it's going to be a tree (xml, yml, s-expressions, json, ucf, etc) with primitive attributes, lists, and other trees as nodes.
I would only allow "value", "version", and "comment" to carry primitive values, and always combine them in a subtree / "object" under the "real" name:
foo {
comment: the metasyntactic variable. # The comment to "foo".
version: 1.2 # The version of the schema for "foo"; optional.
value: bar # The value we store.
}
Eventually you'll want to add more structure:
foo {
comment: ...
version: 2.0
value: bar
constraints {
allow-empty: false
min-length: 3
all-lowercase: true
}
}
Then you will want to factor out common parts. Allow for some kind of substitution that preserves the correctness of the format, and likely a service namespace:
@define {
@version { # Only applicable to nodes with this version range.
from: 2.0
to: 3.1
constraint var-name {
allow-empty: false
min-length: 3
all-lowercase: true
}
}
}
...
foo {
comment: ...
version: 2.0.1
value: bar
constraints {
@var-name # Refer to a factored-out definition.
max-length: 10
}
}
moo {
comment: What the cow says.
version: 1.5.7 # Old version node, old format.
value: MOO!
}
There's going to be a lot more details involved in such an piecemeal-evolvable document. I still think it's a good enough starting point. The only factor that defines architecture's longevity is its ability to change, they say.