In this series:
In the previous installment of this series, we talked about how Azure Service Bus offers us the ability to send our messages to a queue, where each message we send is intended to be handled once, or we can send the message to a topic which allows multiple subscribers to handle each message.
This raises the question of how to decide whether to use a queue or a subscription. The answer is that it depends a lot on what sort of message you are sending.
So far in this series we've talked a lot about "messages", which is simply the generic term we use to describe the data that we are transmitting via Service Bus. We've already seen that a message has a "payload" plus metadata. But its completely up to us what we put in that payload.
Types of messages
There are several types of messages we can send:
- Commands are messages that instruct the recipient to perform a specific task. (e.g.
SendEmail). They are usually posted to queues, since the task is usually intended to be performed a single time (e.g. the
SendEmail command should result in one email being sent).
- Events are messages that inform the recipient that something has happened. (e.g.
OrderReceived). These are usually posted to topics, since the sender doesn't necessarily know or care about what should happen next. Maybe multiple actions should take place, in which case there would be one subscription per action.
- Requests are messages that ask the recipient to fetch some data and to send it as another message, called the response. The request is often sent to a queue, and may specify which queue the response should be sent on (which might even be a temporary queue that is deleted after the response message has been sent). Sometimes this pattern is used to implement synchronous behaviour over the inherently asynchronous nature of messaging, by blocking while we wait for the response message to arrive (I generally consider that to be an anti-pattern though - I prefer to just use a HTTP API if I need to synchronously request data).
- Of course, you can send anything you like in a message. Some people simply serialize documents as messages. This might be an entity from a database such as an Order or a Customer. This sort of message doesn't really imply anything about the intent of the message, so it's another pattern I generally avoid.
Let's focus a bit more on command and event messages which I consider to be the two most useful.
As we've already said, command messages request that an action is performed, and is usually sent to a queue.
My preference is for the command handlers to be as dumb as possible. In other words, the code sending the command should have already made any business logic decisions about exactly what needs to be done. The command message should therefore contain enough information for the handler to simply perform the action.
It's very important for queue message handlers to be idempotent. That is to say, if for whatever reason the command message gets handled twice, it should have the same effect as handling it once. Your chances of achieving this will greatly improve if you adhere to the single responsibility principle and ensure that your message handler only does one thing. That way it's much easier to reason about what would happen if the message handler failed half way through and had to be retried.
Often commands can be raised in a "fire and forget" manner. In other words, the sender just requests that the action is taken, but doesn't need to be notified when the action has been completed. That keeps things simple, but there may be occasions when you need to know when the action has completed, or maybe take mitigating action if the requested task could not be performed for some reason. In these cases, you might permit the message handlers to raise some kind of "done" event. However, this approach should be treated with caution as it can result in using messages to build a poor man's workflow engine, something that is better achieved with a dedicated framework like Durable Functions (which I've created a Pluralsight course on).
Sometimes with commands, people want a way to prioritize them. Some command messages should be processed urgently, maybe because an end user is waiting for that task to complete, while other command messages represent "background" actions that can wait. Queues are inherently first-in first-out, and don't natively support prioritization. There are a few approaches you can take to this. One of the simplest is to simply have two queues - a high priority and a low priority queue. (It is possible to implement roughly the same thing using a topic with multiple subscriptions and use metadata on the message to filter the message to the appropriate subscription, although I prefer just to use queues).
If your prioritization needs become more complex than simply high and low priorities, then it may be that queues are not the most appropriate technology, and the "commands" could be stored in a database, allowing for much richer and more complex sorting to determine what should be done next. Of course if you take this approach, you lose out on some of the benefits of queues, and often end up requiring you to write more complex code, so this should only be attempted if you really need the flexibility to handle commands in arbitrary orders.
Another important consideration with any message handlers is what level of parallelization you require. How many command messages can or should be handled simultaneously? In many cases, it's quite safe to process dozens of messages at the same time. But you do need to consider whether concurrent message handling could overwhelm a downstream resource. We've already seen that the Service Bus SDK lets us specify how many concurrent messages should be handled by a listener process.
Finally, another question that arises is whether to have a queue per message type (e.g. one queue for
SendEmail commands, and another queue for
CreateZipFile commands), or one "command queue" that you put all command messages onto. There is no one right answer to this question. With a queue per message type there is more potential for parallel operations and more control over prioritization. I often go for a compromise approach where similar commands (e.g. commands that result in blob storage operations, or commands that result in communication with a specific web API) are put on the same queue, rather than going to the extreme of just one queue or having dozens of queues.
Whilst command messages have their place, my preference when designing an architecture is to make use of event messages wherever possible. Sometimes these are called "domain events", as each message describes some interesting thing that happens in the domain of the microservice that raises them.
Domain events are particularly great for cross-cutting concerns like auditing. If I want to generate an audit log, I can simply subscribe to the relevant "event" messages and turn them into audit messages, without that concern having to be littered in multiple places through the code. Another example would be user notifications - often users might want to get emailed or texted when certain things happen in the system, and events are a great way of triggering such behaviour.
Try to avoid command messages masquerading as events. Sometimes I see an event named something like "SendEmailRequested". It's basically being too opinionated on what the subscribers should do. Instead raise an event that says what happened, and if the right thing to do is send an email in response to that, then a subscription can be created to send the email.
One challenge with event messages is deciding how much information they should contain. Imagine an event message contains a user id, but all (or most of) the subscribers also need to know the user name and email address. If the message only contains the user id, then all the subscribers will simultaneously need to look up the same user name and email address, which can have a performance impact.
However, if you decide that the message should contain the user name and email address, you are in danger of bloating the message, but also the code raising the event might not have that information to hand, and so have to perform a lookup itself. This can result in you negating some of the performance benefits of messaging, as you are now having to make an extra network call before sending an ongoing message.
I can't say I know what the right answer is here, and I generally take a compromise approach. If the publisher of the event happens to have information to hand that would likely be useful to all or most subscribers, then I put it into the event. But if the publisher doesn't have that information to hand, then I'm fine with requiring the subscribers to look it up themselves, even if that might result in multiple subscribers making the same downstream query. Caching can often help alleviate some of the performance impact in this situation.
Often an application might publish all of its domain events to a single topic, and Azure Service Bus allows subscriptions to either receive all events, or set up "filtered" subscriptions to allow them to only receive the specific events they are interested in. One interesting question is whether in a microservices architecture each microservice should have its own topic, or whether its OK for all microservices to publish to a single topic, which becomes a giant stream of events. I've heard a few different opinions on which is best, and to be honest I'm not sure there's one right answer. You might start with a single shared topic, but break it up into something more granular as your system grows much larger and the volume of messages becomes overwhelming.
As with queues, the single responsibility principle still matters. If there are three things you need to do in response to a single domain event message, then create three subscriptions, and let each one perform one of the actions. This makes it easier to write idempotent code, and also gets things done faster thanks to parallelism. One thing you need to beware of is re-publishing a domain event to a topic because you want to trigger one of its subscribers again (e.g. as part of some kind of error recovery process). That would result in all the subscribers running for a second time, potentially producing unwanted results.
Finally, as I mentioned earlier when discussing commands, it's very tempting to start to chain handlers together into primitive "workflows" with each subsequent stage in the workflow being triggered by a domain event message. This can result in fragile and hard to debug workflows, and so if you detect yourself doing that kind of thing, then consider whether a dedicated workflow framework like Azure Durable Functions is a better fit.
There are many types of message you might choose to send, but commands and events are two of the most important, and your choice of topics or queues usually depends on what sort of message you're sending. It's worth investing some time up front deciding what messages make most sense for your application, which may well be a mixture of commands and events.
Stay tuned for the next installment in this series coming soon...