How to handle storing of arbitrary data in an application enablement platform

Question

I'm working on an AEP in the IoT space, where we've created a few domain-specific entities to allow users to model their data. Each entity has both a type and instances of that entity. These are:

Assets. e.g. an Asset Type could be Truck. Assets are designed to store both attributes (registration, type etc), and metric data about the Truck (location, speed etc). Metric data can be queried historically.
Events. e.g. Ignition On, Speeding. Events can be linked to Assets. They describe things that happened.
Cycles. e.g. Shift Started, Trip Started. Cycles are temporal, with a start & end time. They are tied to Assets and allow for powerful analytics (e.g. show me average distance travelled by a truck per shift).

This works well for a lot of use-cases, but we're frequently running into problems where users want to store data that doesn't fit into the above entities. Some examples (maintaining the transport theme):

Vehicle Maintenance scheduling
Vehicle Pre-Shift Checklist (could store as an Event but a bit contrived)
Parcel

We're considering creating an arbitrary "Object Type" which supports basic CRUD operations + mappings to our existing entities. This is similar conceptually to Tables in ServiceNow, Datastores in AWS Amplify etc. An example of how this might work:

Create a new Object Type (e.g. Vehicle Maintenance Schedule)
Define a schema / allowed properties for that Object Type, including relations to existing entities and other objects:

{
 "name": "Maintenance",
 "schema": {
   "vehicle": {
     "type": {
       "entityType": "ASSET-TYPE",
       "identifier": "person"
     }
   }
 }

Store entities in Postgres, with the data stored as a jsonb column
Have a mapping table that maps objects to entities, e.g.

CREATE TABLE object_entity_mapping {
 object_id UUID,
 object_type UUID,
 entity_id UUID,
 entity_type UUID,
 linked_field varchar
};

Concerns

Speaking with my colleagues, there's concerns about the scalability and complexity of the solution, particularly if complex queries are required. As our application is multi-tenanted it's hard to make guarantees about how our users will use the objects.

Some ideas we're discussing for performance:

Partition on the Object Type + index on the jsonb column
Limit the amount of nesting to 3 levels deep (we'd prefer users link to other entities/objects rather than store all the data in a huge document)
Potential limitation on data size

Questions

Does the approach as described make sense? Any pitfalls that we might be missing here?
Are there other tools/resources/common approaches to solving this problem?
Are there any alternative solutions that might be worth exploring?

To me it looks like you are forwarding schema definition responsibility to your clients. This may sound extremely flexible at first, it will become quickly a nightmare to implement and test it correctly through your APIs, too complicated to explain to your users and also difficult to write meaningful queries related to clients data. This always starts easy, but soon they will need to make more reports which will require digging through that data with a custom defined schema. — Florin C., Jan 12 '22 at 08:18

How to handle storing of arbitrary data in an application enablement platform

0 Answers0