When an application A communicates with an external/3rd party system B, there is always a chance that B is down.
Say that A raises an event that should be sent as a message to B via HTTP, what is the best way to guarantee that the message is delivered?
One possibility is of course to have some retry logic to resend the message a few times. But what should we do with the message if delivery fails too many times? Or if A crashes (maybe due to too many messages waiting to be sent)? Then we need a way to persist those messages to be able to resume message delivery after A has recovered.
My first idea was to store all events in a dedicated table in the database and mark off when they are sent. Then a colleague argued that we can't always rely on the database and we should instead store the messages locally on the filesystem. But the latter approach looks like we'd be implementing a message queue ourselves and we'd be better of with a real full-fledged message queue (which we currently don't have for this application).
The same colleague then argued that even if we have a message queue, we can't be sure that the message is delivered to the queue and we'd still need to implement a queue on the filesystem. That really seems overkill to me. That would mean that to be really sure we need to implement a locally stored message queue for all communications even between our own different microservices.
For context this is low volume a system with few messages (at most in the 100s) per day, but they have very high value (used for billing) so that we don't want to miss any.
Any thoughts?