I'm having doubts about how to implement and keep synchronization between two datasources in a distributed system. In my case, I have a service that checks for expired jobs in a repository. If the job has expired, then it is removed from the repository and enqueued in a distributed queue (the example is in Python but should be easy to understand):
def check_expired_jobs(self):
jobs = self._job_repository.all()
for job in [ job for job in jobs if job.has_expired() ]:
self._job_queue.enqueue(job.crawl_task)
self._job_repository.delete(job)
My concerns are that a lot of things may happen here since both queue and repo are remote data sources. If the queue operation is successful but for whatever reason the repository deletion fails I could run into an inconsistency issue. It's not the first time I encounter such a problem and I want to tackle it in the best way possible.
What would be the best practice to keep several data sources/repositories in sync?