Messaging with Azure Service Bus - Part 2 - The Challenges of Messaging
In this series:
- Part 1 - Why Use Messaging?
- Part 2 - The Challenges of Messaging
- Part 3 - Introducing Azure Service Bus
- Part 4 - Sending and Receiving Messages
- Part 5 - Message Sending Options
- Part 6 - Message Receiving Options
- Part 7 - Topics and Subscriptions
- Part 8 - Commands and Events
- Part 9 - Premium Features
In the previous installment, we looked at some of the benefits of using messaging, which include performance, scaling, resilience and decoupling. But it is possible to run into difficulties with messaging, and it's important that you are aware of some of the challenges you'll face. In this post, we'll look at some of the most common issues you might run into, and then later in this series we'll see some of the ways we can work around those problems.
First, because messaging is inherently asynchronous, the sender of the message doesn't know whether the message was successfully handled. This often means that a more complex mechanism for reporting back to the end user is required. You've already experienced this with many online shopping experiences, where some time after placing your order, you receive an email telling you that the order has been accepted, and there is also likely an order status page you can visit to check on how things are going. So moving more processes to asynchronous operation usually results in additional development work to report the progress of those operations back to the end users.
Second, in an ideal world, we could guarantee that every message we send will be received exactly once. It turns out that this is impossible to guarantee. Instead, many messaging systems (including Azure Service Bus) support the concept of "at least once" delivery. Often this works by the receiver of the message being required to report back to the message bus once they've finished handling the message. If the message bus doesn't get that message after a reasonable period of time, it makes the message visible again, and it can be delivered again. The main way to work around this issue is to seek to make your message handlers idempotent - in other words, handling the same message twice should not result in a different outcome to handling it once. This is crucially important if your message handler is taking a payment - we don't want to charge customers twice. But for other types of message handler, it may be less important - e.g. getting two confirmation emails is not necessarily a big problem.
Third, we cannot assume that our messages will be handled in order. Of course, if you just had a single message listener, working on one message at a time, then it is likely that it will receive each message in the order the message was sent. But as soon as you have multiple listeners, now you are handling two or more messages at the same time, and so its possible that message 2's handler completes before message 1's handler does. When you factor in retries, it gets more messy - if a transient error causes the first attempt at handling message 1 to fail, then message 1 might come back to be retried after message 2 has been handled. The best way to deal with this issue is to try to create message handlers where order is not important. If order really does matter for your messages there are some techniques you can use to support that, but it introduces another level of complexity.
Fourth, due to the asynchronous nature of messaging, it's important that you think carefully about message versioning. Suppose we change the format of a message. You might think that we could just update both the sender and recipient to use the new message version. But there might still be instances of the original message version in the queue, even after we've upgraded the code. So you have to ensure that your receiving code can cope with both old and new versions of a message. Also, in microservice environments, we can't (and shouldn't) assume that both the sender and recipient of a message will be upgraded at exactly the same instance.
Fifth, we said last time that one of the benefits of messaging is that it allows us to scale out, and handle many messages in parallel. But it's remarkably easy to inadvertently overwhelm downstream services with this approach. For example, if every message handler performs a database operation, and we attempt to handle dozens of messages in parallel, can our database server cope with that load? In practice this means that we need to implement patterns like circuit breakers, or back-off and retry, and in my experience message busses do not always provide much help in implementing such protection.
Sixth, although security is not a weakness of messaging, there are some challenges you need to think about. Usually with messaging anyone who has access to a connection string is able to both publish and receive messages off the bus. So those credentials need to be very carefully managed. But you might also want to put additional restrictions on your applications like strict controls over which queues or topics each application is allowed to publish to or receive from. Also, as I discuss in my Building Microservices Pluralsight course, it's possible for a "confused deputy attack" to result in a message being posted that requests an operation that should not have been allowed. Care is therefore needed to ensure that the end user is truly authorized to perform the action requested by the message.
Finally, although in theory you can just dump messages onto a message bus, and trust that they will be handled later, in reality it's not always that straightforward. What if the message bus itself is down? Services like Azure Service Bus do offer good SLAs with high availability - but you do need to think about what you'll do if you can't publish a message. Also, message busses often have limitations on the size of messages, the number of messages that can be queued, and how long those messages can be held onto before they are discarded. It's important that you are aware of these constraints and limitations, as if the message handlers get too far behind (or aren't running for some reason), you can end up losing messages.
In summary, although messaging offers lots of compelling benefits, it shouldn't be thought of as a magical solution that will make all your problems go away. When introducing these asynchronous integration patterns, it's very important that you are aware of the potential problems, and have plans for how to deal with these issues. Having said that, in many cases, the benefits outweigh the down-sides, and so in the next installment, we'll start to look at one of my favourite messaging platforms - Azure Service Bus.