I'm working on an AEP in the IoT space, where we've created a few domain-specific entities to allow users to model their data. Each entity has both a type and instances of that entity. These are:
- Assets. e.g. an Asset Type could be Truck. Assets are designed to store both attributes (registration, type etc), and metric data about the Truck (location, speed etc). Metric data can be queried historically.
- Events. e.g. Ignition On, Speeding. Events can be linked to Assets. They describe things that happened.
- Cycles. e.g. Shift Started, Trip Started. Cycles are temporal, with a start & end time. They are tied to Assets and allow for powerful analytics (e.g. show me average distance travelled by a truck per shift).
This works well for a lot of use-cases, but we're frequently running into problems where users want to store data that doesn't fit into the above entities. Some examples (maintaining the transport theme):
- Vehicle Maintenance scheduling
- Vehicle Pre-Shift Checklist (could store as an Event but a bit contrived)
- Parcel
We're considering creating an arbitrary "Object Type" which supports basic CRUD operations + mappings to our existing entities. This is similar conceptually to Tables in ServiceNow, Datastores in AWS Amplify etc. An example of how this might work:
- Create a new Object Type (e.g. Vehicle Maintenance Schedule)
- Define a schema / allowed properties for that Object Type, including relations to existing entities and other objects:
{
"name": "Maintenance",
"schema": {
"vehicle": {
"type": {
"entityType": "ASSET-TYPE",
"identifier": "person"
}
}
}
- Store entities in Postgres, with the data stored as a jsonb column
- Have a mapping table that maps objects to entities, e.g.
CREATE TABLE object_entity_mapping {
object_id UUID,
object_type UUID,
entity_id UUID,
entity_type UUID,
linked_field varchar
};
Concerns
Speaking with my colleagues, there's concerns about the scalability and complexity of the solution, particularly if complex queries are required. As our application is multi-tenanted it's hard to make guarantees about how our users will use the objects.
Some ideas we're discussing for performance:
- Partition on the Object Type + index on the jsonb column
- Limit the amount of nesting to 3 levels deep (we'd prefer users link to other entities/objects rather than store all the data in a huge document)
- Potential limitation on data size
Questions
- Does the approach as described make sense? Any pitfalls that we might be missing here?
- Are there other tools/resources/common approaches to solving this problem?
- Are there any alternative solutions that might be worth exploring?