Suppose we have some microservices and a saga will run to do a transaction in 6 microservices.
What if the whole system dies(unexpected shutdown), on middle of saga process in the step number 4?(System died, So state is lost)
Suppose we have some microservices and a saga will run to do a transaction in 6 microservices.
What if the whole system dies(unexpected shutdown), on middle of saga process in the step number 4?(System died, So state is lost)
That’s not the way a saga works:
If the system fails between two steps, when it’s restarted, the processing just goes on where it left: the event queue is reliably persisted and the next step will be triggered by the event already on the queue.
If the system fails in the middle of the step, when it is restarted, either the step should go on (if the state of the step can be restored) or the step is rolled-back (since it’s managed in a transactional manner). Then it depends on how you have designed your saga and steps for node failure:
Of course, this is greatly simplified, because distributed processing is very complex and needs very careful design (e.g. what if you relaunch a step on a new instance, but the old instance managed to recover with the risk of having things processes twice).