Posted in:

Last week I was tasked with tracking down a perplexing problem with an API - every call was returning a 500 error, but there was nothing in the logs that gave us any clue why. The problem was not reproducible locally, and with firewall restrictions getting in the way of remote debugging, it took me a while to find the root cause.

Once I had worked out what the problem was, it was an easy fix. But the real issue, was the fact that we had some codepaths where exceptions could go unlogged. After adding exception logging to some places that had been missed, it became immediately obvious what was going wrong. Had this issue manifest itself on a production system, we could have been looking at prolonged periods of down-time, simply because we weren't logging these exceptions.

So this post is a quick reminder to check all your services - are there any places where exceptions can be thrown that don't end up in your logs? Here's three places to check.

Application startup

When an application starts up, one of the first things you should do is create a logger, and log a "Starting up" message at informational level. This is invaluable in providing a quick sanity check that at your application code did in fact start running and that it is correctly configured for logging.

I also like to log an additional message once all the startup code has completed. This alerts you to any problems with your service only managing to get half-way through initialization, or if there is a long-running operation hidden in the start-up code (which is usually a design flaw).

Of course, you should also wrap the whole startup code in an exception handler, so that any failures to start the service are easy to diagnose. Something like this is a good approach:

public void Main()
{
    var logger = CreateLogger();
    try 
    {
        logger.Information("Starting up");
        Startup();
        logger.Information("Started up");
    }
    catch (Exception ex)
    {
        logger.Error(ex, "Startup error");
    }
}

Middleware

In our particular case, the issue was in the middleware of our web API. This meant the exception wasn't technically "unhandled" - a lower level of the middleware was already catching the exception and turning it into a 500 response. It just wasn't getting logged.

Pretty much all web API frameworks provide ways for you to hook into unhandled exceptions, and perform your own custom logic. ASP.NET Core has exception middleware that you can customize, and the previous ASP.NET MVC allows you to implement a custom IExceptionHandler or IExceptionLogger. Make sure you know how to do this for the web framework you're using.

Long-running threads

Another place where logging can be forgotten is in a long-running thread such as a message pump, that's reading messages from a queue and processing them. In this scenario, you probably have an exception handler around the handling of each message, but additionally you need to log any exceptions at the pump level - e.g. if it loses connection to the message broker, you don't want to die silently and end up no longer processing messages.

In this next sample, we've remembered to log exceptions handling a message, but not exceptions fetching the next message.

while(true)
{
    // don't forget to handle exception that happen here too!
    var message = FetchNextMessage(); 
    try
    {
        Handle(message);
    }
    catch(Exception ex)
    {
        logger.Error(ex, "Failed to handle message");
        // don't throw, we want to keep processing messages
    }
}

You might already have this

Of course, some programming frameworks and hosting platforms have good out-of-the box logging baked in, which saves you the effort of writing this yourself. But it is worth double-checking that you have sufficient logging of all exceptions at whatever point they are thrown. An easy way to do this is to just throw a few deliberate exceptions in various places in your code (e.g. MVC controller constructor, middleware, application startup, etc), and double-check that they find their way into the logs. You'll be glad you did so when something weird happens in production.

In a world of microservices, observability is more critical than ever, and ensuring that all exceptions are adequately logged is a small time investment that can pay big dividends.

Comments

Comment by Marc Roussy

I was working on a similar problem this week - having detailed logs was the only way I could figure out what was happening as well. To avoid overpolluting the logs I tend to set the default log level to Information, which gives me a basic idea of what is happening in the application. But I'll add Debug() calls with extra details where it's appropriate. Then it's just a matter of increasing the log level to Debug and I can dig in to the problem.

Marc Roussy