distributed application design - using message broker

Question

Background

I'm trying to design a layer/component in my distributed application that will communicate between component A and component B.

Right now, this "communication" is accomplished by replicating an entire database from the server where component A lives over to component B. But this design has proved to be very brittle and too tightly coupled. I want to recommend to the team that we move to a messaging solution where A notifies B of a change of state of the data.

Questions

But here's the question. Depending on the size of the data, I can either

send the data as a part of the message payload,
or just notify B that's there something for B to do / to grab.

Clearly, option 1 is easier because I won't need to write too much code in my subscriber logic. However, the reality is that sometimes (not too often) the size of the data may exceed the max allowable size of the message. (i'm going to use mqtt to start my prototype which as a max payload size of 268435455 bytes)

As far as design is concerned, I realize it's best for each service / component (A and B) to be autonomous... and to just "know" what it has to do when it gets a message. But in practical terms, what does this mean?

For example, consider this use case:

"A" publishes a message that a new widget has just been created in it's database. "B" is a subscriber on the "widget" channel. It gets the message that includes the following:

- action:CREATE
- widgetID: 123

To get the data about the widget, should B:

call a REST API running on server A?

eg) https://myserver/mywidgetAPI/getwidgetdetailsByID/123
be able to grab the widget data from the message payload?

Other questions would include:

when should I consider the message broker to be successful? WHen the message is delivered? Or when B has a copy of the data in question?
any good articles you know of that I can read to help with this type of design?
what about if i just avoided messaging all together.. and wrote a CRUD interface for component B? "A" would call the CRUD interface whenever a widget was changed. what's the benefit of using message broker over this type of a solution?

JimmyJames · Answer 1 · 2017-06-20T21:07:12.050

First some baseline realities of queues:

Queues are a powerful tool but they also introduce complexity. The big thing you need to solve for is what happens if a message on a queue is not handled successfully. A naive implementation will read from the queue, commit (and delete the message) and then try to process. If an error occurs, the message is gone. You can actually use this approach if it's not crucial that every message is processed. For example, if you have a ledger of all things that are needed and it's OK to periodically clean-up. If you must process every message in order, this is a big problem.

You can also decline to commit the message off the queue until you have successfully processed it. This is more robust but there's an issue of poison messages. If you simply rollback on error, you can end up in an infinite loop of reading the same message off the queue. Meanwhile more messages are piling up and eventually ka-blammo! This will happen in the middle of the night while you are riding a unicorn through the lollipop forest. So you need poison message handling. But if you are required to process each message in order, this is still not going to help.

It's not clear whether you want queues or topics. One of the strengths of Kafka is that you don't have to choose because one abstraction covers both approaches. In standard queuing platforms you must decide this early, possibly before you really understand the problem.

I would say that the pattern in 2 is much more robust. What I would probably do is make the restful interface able to provide any and all events that have not been processed (at the very least.) For example, if can see the previous event on the current one, your receiver will know if it is processing out of order. Then, if a message or two doesn't get handled, it's no big deal. You either wait for the next message or force an update.

The most rigorous answer for when to consider the message delivered is when it has been committed. If you really want to do this the 'right way' you use 2-phase commit. This stuff is all really sound but it can also be a bit tricky to set up. My preference would be for building reliable systems that work when the components are unreliable as opposed to depending on the reliability of things like your queuing infrastructure. If that sounds weird, consider that TCP (reliable) is built upon UDP (unreliable).

I don't know of a specific article but if you haven't visited this site, you should start there.

I'm not sure what you mean by "message broker" here since all we've discussed so far is a simple queue. There is a message broker pattern but I don't think this qualifies. I would say queues are useful in situations where you are processing work asynchronously especially when you want to distribute the load over a number of consuming systems. Even when those conditions are met, I would stick with small messages. Trying to send large amounts of data on queues has not been a winning strategy in my experience. I would not attempt it again. IMO 'reliable messaging' is a misnomer and so is 'guaranteed delivery'.

Your idea of having an API on B that A calls could work but what happens if B cannot be reached at the moment or chokes on the call (e.g. too much volume)? One big advantage of a queue is that it can act as a large buffer with independent storage. A doesn't depend on B in order to put messages on the queue. B can be down and the messages will be held until it is ready to handle them. Of course, the queuing system could be unavailable which is why you need it to be very robust. These are reasons again to avoid large messages. The smaller the messages, the easier (and cheaper) it is to store more of them for longer. Large messages are also more likely to create issues with the queuing system too.

thanks for the response. In most cases I don't have to worry about sequence. Just as long as the final state of the widget looks the same on both databases... i'm good. — dot, Jun 20 '17 at 17:05
that simplifies things greatly. What would be the impact of missing an update and how long can you tolerate missing one? — JimmyJames, Jun 20 '17 at 18:29
It would have a big impact if something was missing... but as long as it was rectified within 10 minutes i think it'd be fine. Also, I've added another question at the bottom of my post, if you wouldn't mind commenting? — dot, Jun 20 '17 at 19:15

distributed application design - using message broker

1 Answers1