4

An interesting question I've stumbled upon:

Let's assume a java application creates a data model, converts this data to a json object with two fields and uploads it to a server:

{
    "FirstName": "Foo",
    "LastName": "Bar"
}

Now a different technology, say js, php, etc on the server needs to process the data.

Both sides are well tested against a set of unit tests but naturally integration across different languages and technologies is hard to be tested in depth.

Is there a known principle or technique that can validate the data on both ends? I mean lets say there is a contract that says that LastName must not, by any chance be <2 chars in size. This is a business rule which is a hard constraint on the data.

Now the java implementation has a bug or simply overlooked the requirement for LastName. Both sides complete unit tests but they fail hard during integration.

Is there some technique that allows to specify such rules across languages? I am not talking about runtime rejection of malformed data but rather ensure consistency on both sides of an application.

Samuel
  • 723
  • 1
  • 5
  • 7
  • 1
    It sounds like you're looking for an [XML Schema](https://www.w3.org/standards/xml/schema), but for JSON. Is that correct? – Dan Pichelman Mar 23 '18 at 21:54
  • You're not alone - been wondering if it is possible too. Still not sure if there is any ready-made solution. These are the ideas I have come up with so far - the validation rules must: (1) be coded as expression trees, compilable at run-time into every target language; (2) be immediately testable in the language of declaration; (3) produce exactly same result on every platform; (4) be essential part of the object schema so that applications willing to talk same entities also know the rules; (5) respect i18n and l10n. – yegodm Mar 23 '18 at 21:55
  • @DanPichelman, JSON was an example, there are JSON Schema too. The requirements may be much more complex that a schema can handle, for instance only a subset of values allowed in `LastName` when `FirstName` is "World", etc – Samuel Mar 23 '18 at 22:31
  • @yegodm I agree, although I would prefer code generation to avoid the need to spin up a rule engine and incur overhead – Samuel Mar 24 '18 at 07:30
  • @DanPichelman the longer I think about it the more I agree with you, that I might need a JSON schema validation with bindings to multiple languages. If my domain is java I want to plug into java naturally, if it is js, then there. I know there is something similar from Newtonsoft called Json Schema and it looks right but is limited to C# projects only – Samuel Mar 24 '18 at 09:03
  • Code generation will also do - should not be a problem. Overall there are two major challenges supporting: (1) entity-wide constraints, involving multiple properties. For example, used car offer cannot specify `car.state == New` when `car.mileage > 0`, and (2) domain-wide rules, like `user.alias` must be unique. In that case the domain must be consulted to validate the rule, possibly involving a remote call. – yegodm Mar 24 '18 at 14:00
  • 3
    You need a common language to describe the validation rules on both ends. JSON Schema is surely an option, another one is to use an programming eco system which is suitable for all the clients or servers involved, so you can simply put the logic into one reuseable library. For example, JS/Node may allow you to write modules you can use in Browser code as well as in Server code. – Doc Brown Sep 24 '18 at 15:11

3 Answers3

1

The phrase I've come across for this is single source of truth:

In information systems design and theory, single source of truth (SSOT) is the practice of structuring information models and associated data schema such that every data element is stored exactly once. Any possible linkages to this data element (possibly in other areas of the relational schema or even in distant federated databases) are by reference only. Because all other locations of the data just refer back to the primary "source of truth" location, updates to the data element in the primary location propagate to the entire system without the possibility of a duplicate value somewhere being forgotten.

Software can be generated as noted above:

In software design, the same schema, business logic and other components are often repeated in multiple different contexts, while each version refers to itself as "Source Code". To address this problem, the concepts of SSoT can also be applied to software development principals using processes like recursive transcompiling to iteratively turn a single source of truth into many different kinds of source code, which will match each other structurally because they are all derived from the same SSoT.

Specifically, some cross language specifications include:

Projects such as Apache Thrift normalize this:

Apache Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages. Instead of writing a load of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business.

References

Paul Sweatte
  • 382
  • 2
  • 15
  • Paul, thank you for improving the term I was describing in less precise terms. While not really an answer I see why posting this abstract would be hard when doing it as a comment. – Samuel Sep 25 '18 at 10:48
  • @Samuel An interface file such as that used by [OpenAPI](https://en.wikipedia.org/wiki/OpenAPI_Specification) or an interface definition language such as the [Web IDL](https://www.w3.org/TR/WebIDL-1/) may be closer to the cross-language contract you are looking for. There was a question about cross-language [naming conventions](https://stackoverflow.com/questions/8944851/naming-conventions-for-a-multi-programming-language-project/12902653#12902653) which may help. – Paul Sweatte Sep 25 '18 at 13:48
1

Integration Testing

You succinctly describe the root issue here, namely lack of integration testing.

Both sides complete unit tests but they fail hard during integration.

The team building the Java application forgot to include validation and unit tests for when LastName is less than two characters.

The team building the PHP application, on the other hand, remembered to include the validation and unit tests.

Luckily, the QA team knew about the LastName requirement and added the name Jin-u O in their integration testing suite. When they called an API operation in the Java application using that name, expecting a 400-level validation error, they instead got a 200 and the test failed.

Is integration testing easy? Absolutely not. But it's definitely worth it if you've got the time and resources to test your applications against an objective set of criteria.

Dan Wilson
  • 3,073
  • 1
  • 12
  • 16
0

JSON Schema has validator implementations in more than 17 languages. Which probably makes it the most cross-language object validation methodology. The implementations do vary a bit in completeness though.

naught101
  • 1,210
  • 11
  • 15