This is the common sequence of two distributed components in our Java application:
1 A sends request to B
2 B starts some job J in parallel thread
3 B returns response to A
4 A accepts response
5 Job finishes after some time
6 Job sends information to A
7 A receives response from a Job and updates
This is the ideal scenario, assuming everything works. Of course, real life is full of failures. For example, one of the worst cases may be if #6
fails simply because of the network: the job has been executed correctly, but A
does not know anything about it.
I am looking for a lightweight approach on how to manage errors in this system. Note that we have a lot of components, so clustering them all just because of error handling does not make sense. Next, I ditched the usage of any distributed memory/repo that would again be installed on each component for the same reason.
My thoughts are going in the direction to have one absolute state on a B and to never have a persisted state on a A
. This means the following:
- before
#1
we mark onA
that the work unit i.e. the change is about to start - only
B
may un-mark this state. A
may fetch info about theB
any time, to update the state.- no new change on the same unit can be invoked on
A
.
what do you think? Is there any lightweight way to tame the errors in system of this kind?