The publish-subscribe pattern: garbage-collecting old subscriptions

Question

I've been studying distributed systems design, in particular the Udi Dahan's class. He talks about the publish-subscribe pattern as a common pattern in messaging-oriented designs. There's obviously a clear point in time when subscriptions to a publisher happen. However—and I might be missing something—he spent no time discussing unsubscribes.

So my question is—given a distributed system with components that evolve over time, how do popular implementations of this pattern deal with stale subscriptions? I understand that in a optimal situation, there are points in time when the subscriber explicitly declares that it is no longer interested in messages from a given source (maybe on clean shutdown, maybe on some kind of decommissioning of the component)… but given faulty networks, software and hardware, I can imagine the case where the publisher keeps thinking the subscriber still waits for its messages despite not existing for years, and no-longer-necessary messages clogging the messaging system.

score 4 · Answer 1 · answered Jun 28 '20 at 15:51

Yes! This is a problem. First, the easy cases:

For pub/sub within a process (c.f. the Observer Pattern), failing to properly de-register subscribers is a memory leak, because the reference from the publisher to the subscriber would always keep the subscriber alive. This can be prevented using weak references which do not keep the subscriber alive, but turn into a null pointer.
If the subscription uses a persistent connection (e.g. over TCP) then the subscription can be cancelled when the connection times out. Timeouts are detectable because the participants send keep-alive messages to each other.

The difficult problems are if we don't have a persistent connection. Here, a variety of approaches exist somewhere between these extremes:

Do not guarantee delivery. If a subscriber is offline, they will miss messages. This can be a surprisingly effective solution, if the problem domain permits.
Keep a log of all messages, and let subscribers request re-transmission of all messages after some sequence number. The publisher doesn't need to know the status of offline subscribers. This approach seems to work well for massively distributed systems with low data transfer rates, e.g. blockchains.
Arguably the same but more extreme: If messages are published frequently but read rarely, don't push messages but let the reader poll for updates, i.e. don't actually use pub-sub. This works well e.g. for social media subscriptions.

I worked on a banking software that had to deal with services that can crash. Our message bus only guaranteed delivery to currently-connected subscribers, but subscribers needed to see all messages. Our solution was to have the sender log all messages in a database. When a subscriber booted, they would connect to the message bus and start buffering most messages but could act on some messages immediately. Meanwhile, the subscriber would read the missed messages from the database. Once the sequence number of a message in the database matched a buffered message, the subscriber was caught up. However, this only worked because we had a second communication channel for non-realtime communication, and because processing of the message backlog was fast compared to the rate at which new messages arrived. In principle, the message buffer could have been a circular buffer that overwrites old messages, since the messages would have been eventually read from the database.

When were events removed from DB? Infinite event log is equivalent to subscriber leak. — Basilevs, Jun 28 '20 at 16:14
@Basilevs - in some problem areas newer messages supercede older messages (of the same type, or id/key, or whatever) and thus though you may have to log a lot of messages because you have a lot of types/ids/keys you only have to keep the last one of each. Or maybe you need only a limited history and so you log only the last 10 of each or the last 30 days of each or whatever. And there are other case-by-case solutions of a similar nature. — davidbak, Jun 28 '20 at 18:01
As far as I understood the class, Udi Dahan considered a connection-less, at-least-once delivery model. But your answer gave me wider perspective that this is not the only situation where pub-sub makes sense. Thanks! — liori, Jun 28 '20 at 18:31

The publish-subscribe pattern: garbage-collecting old subscriptions

1 Answers1