Notify external services about changes to data in a database (sanity check)

Question

I'm working on a project which revolves around a database of about 750.000 entities. Let's just say these entities are musical works. Each musical work is associated with an album, one or more composers and some other entities.

The business who owns this database has bought services from other providers which requires the business to "notify" these providers when a musical work gets created, updated or deleted in the database.

An important note here is that any given musical work is considered updated if the entity itself (the musical work) or any entity which the musical work is associated with gets updated. In other words, if the composer of the musical work "Air On The G String" was renamed from "Bacch" to "Bach" - then that musical work (and all other tracks related to that composer) should be considered updated.

The entities are stored in a relational database, and there is no built-in functionality as such in the database which allows us to keep track of which entities have been created, updated or deleted. It is also a requirement to not rely on any such functionality. Furthermore, the graph of entities which makes up a single musical work can grow quite large. I don't think creative SQL statements would be the way to go here to somehow discover created, updated or deleted entities.

What I've been thinking is to set up a job which once a day simply iterates over all musical works and creates a hash for each and every one. I've so far come to the conclusion that for any given musical work in the database, the job will have to pull the musical work itself from the database along with associated entities (album, shares, composers etc) and generate a hash based on relevant fields.

I believe that a hash can be generated by concatenating track fields, share fields, composer fields and album fields - and then create a hash based on that (long) string.

The generated hash for each musical work will then be stored in a separate database, and by doing it this way the job can compare musical works (through the generated hashes) for each run vs. the previous run.

This way, the job will be "data store agnostic" (the entities can come directly from the database, from a webservice or something else) and the actual state of the last replicated version of any given musical work will be safely stored as a hash.

Now, I'd like a sanity check on this design. Does this seem to be a fair way to solve this task or am I complicating things?

If it is a requirement/desire that the check is datastore independent, then your use of a hash certainly looks like a good way to solve the task. — Bart van Ingen Schenau, Sep 06 '16 at 07:57
The use of a hash seems appropiate. Is a time resolution of 1 day sufficient for the external providers? Since your solution relies on polling all of the data each time potential notifications are to be sent, consider the performance implications of having to reduce the interval (or finding that the amount of entitites in the database increased). — Marco Borchert, Sep 06 '16 at 08:20

Notify external services about changes to data in a database (sanity check)

0 Answers0