0 Comments Posted in:

In this article I want to discuss the materialized view pattern and why I think serverless is a good match for it. I've written this as my submission to the "Serverless September" initiative. Be sure to check out the other great contributions to this series.

Materialized Views

Let's start with a brief overview of what is meant by a "materialized view". In an ideal world, we'd store all of our data in a database that was perfectly optimized for every query we threw at it. No matter what question we asked of the data, it would respond quickly and efficiently.

In the real world, unfortunately, that's just not possible. Whether you're storing your data in a relational database like SQL Server or a document database like Cosmos DB, there are going to be occasions where the queries you want to run are simply too slow and costly.

The materialized view pattern is a simple idea. We build a new "view" of the data that stores the data in just the right way to support a particular query. There are several different approaches to how the materialized view is generated, but the basic idea is that every time we update application data, we also update any materialized views that depend on that data.

You can create as many materialized views as you want. It's quite common to create one for a specific view on a web-page or mobile application, since performance is especially important when an end user is waiting for a page-load.

An Example Scenario

Let's consider a very simple example scenario. Imagine we have an e-Commerce website. Our database will store products (all the things we sell) and orders (all the things we've sold).

If our database is a relational database, then we'd probably have at least three tables - one for the products, one for orders, and one for "order lines" as a single order may be for multiple products.

database schema example

And if our database was a document database, then we'd probably use a document for each product and a document for each order, including its "order lines". Here's a simplified representation of what an order document might look like in a document database:

{
    'id': 1234,
    'customerEmail': '[email protected]',
    'lines': [
        { 'productId': 129512, 'quantity': 1},
        { 'productId': 145982, 'quantity': 2}
    ]
}

So far, so good. But let's imagine that when a customer views a product, we also want to show them some recommendations of a few products that other customers who bought this item also purchased.

Whether we used a relational database or a document database, this query is perfectly possible to implement. All we need to do is to find all orders that include the product currently being viewed, and then group all the "order lines" by product id, sort them by the most popular, and take the top few rows. Simple! But very slow and expensive. If we have millions of orders in our database the query will probably take far too long to complete for the web-page to load without timing out.

Instead we can create a materialized view that is specifically designed to return the product recommendations. For each product we could keep a running total of the counts of other items purchased at the same time. And now what was previously an expensive aggregate operation across all order lines in the entire database becomes a simple lookup by product id.

Eventual Consistency

Having decided we want a materialized view, we have a problem. How will we keep the view up to date?

Basically every time a new order comes in, as well as saving it to the database, we also need to update the materialized view. If a new order is placed for products A and B, then we need to update two rows in our materialized view: the row for product A to add B as a recommendation, and vice versa.

The most common approach taken for materialized view generation is asynchronous updates. For example, when a new order comes in, we might publish an message to an event bus, and a subscriber to that message can be responsible for updating the materialized view.

This approach has several benefits. It means the code storing a new order can remain quick, and the potentially slower operation of updating the materialized view can happen in the background later. It also means we can scale out with many materialized views being updated in parallel.

But it does also mean that there will be a short window of time where our materialized view isn't fully up to date. This is known as the problem of "eventual consistency". The recommendations materialized view might not be fully up to date immediately, but will eventually become so.

This idea can worry people. Isn't it bad to show our customers data that isn't fully up to date? Actually it turns out that in many cases it doesn't matter as much as you might think. In our example, it hardly matters at all. So what if the recommendations I see for related products aren't perfectly up to date?

Of course, the asynchronous approach to updating materialized views isn't strictly required. There are ways to put the update to the main data and all materialized views inside a single transaction, so they are always consistent. However, this comes at a significant performance cost penalty and may also involve the complexities of distributed transactions.

Why Serverless?

Now we've considered what a materialized view pattern is, and why we might want to do it, let's see why I think serverless is a good fit for implementing these views.

First, materialized view updates are usually event-driven. And serverless "Functions as a Service" frameworks like Azure Functions are great for implementing event handlers. You could create an Azure Function for each materialized view you want to keep up to date. The built-in scalability of Azure Functions means that even if you have lots of materialized views to update, they can be processed in parallel.

Second, sometimes you need to rebuild an entire materialized view from scratch. This is common when you're introducing a new materialized view, so you can't just subscribe to updates to the source data - you need to go back to the beginning. This generates a significant backlog of work, and serverless platforms like Azure Functions have the capacity to rapidly scale out to help you catch up quickly.

Alternatively, sometimes materialized views are generated using a schedule. For example, maybe every night at midnight you completely rebuild one of the views. Again, this is something that serverless is great at - it's easy to create a timer-triggered Azure Function (although if you're processing a lot of data in a single function you'll need to take care not to time out - in the consumption plan Azure Functions have a time limit which defaults to 5 minutes)

Finally, it can be very hard to predict ahead of time how much compute resource is needed to maintain a materialized view. Some views may require very infrequent updates, while others are constantly churning. Some changes can come in bursts. Again, having a platform that dynamically scales out to meet demand, but also can scale to zero to save money during idle periods gives you a cost-effective way to pay only for the compute you need.

Cosmos DB change feed

The materialized view pattern can be implemented against any database, including SQL Database and Cosmos DB. In fact, it's quite common to pair this pattern with "event sourcing", where you simply store "events" that represent the changes to application state, rather than the current state of the application, which is typically what a regular database would store.

Cosmos DB has a really helpful feature called the "change feed" which allows you to replay all the changes to documents in a collection. That is to say, every time a new document is created or updated, a new entry appears in the change feed. You can then either subscribe to that change feed, to be notified about every change, or replay it from the start, to receive notifications of all historic changes. This makes it ideal for both generating a new materialized view and keeping it up to date.

The change feed needs to keep track of how far through the backlog of change events you are, and this is done by maintaining a "lease" which is very simply stored in a container in your CosmosDB database.

Azure Functions comes with a built-in trigger called the CosmosDBTrigger that makes it really easy to subscribe to these change notifications, and use them to update your materialized views. And it gives you the changes in batches (typically 100 changes in a batch), enabling you to make your materialized view update code more efficient.

Sample application

To demonstrate serverless materialized view generation with the Cosmos DB change feed, I decided to create a very simple proof of concept application. The scenario I chose to implement was the e-Commerce example I described above, where we want to generate product "recommendations" based on order history from all customers.

I chose Cosmos DB for my database, and my database design is very simply a products collection which holds a document per product. And an orders collection which holds a document for each order. For the recommendations, I could have created a recommendation document per product (so the recommendation materialized view was entirely separate), but for this demo I decided to store the recommendations directly in the product document itself.

I made a very simple console application to auto-generate products and random orders that use the products. Here's the code for the product generator and the order generator.

I also wrote a function to produce the recommendations on the fly without the aid of a materialized view. This shows off quite how slow and expensive performing this query is. (With CosmosDB, every query consumes a number of "request units" (RUs) and the more orders you have, the more RUs this query will consume) It was already getting very slow with just a few thousand orders in the database, and I suspect that by the time there are millions of orders, it simply wouldn't be possible to run this in real-time.

Then I created an Azure Function application that contained a single function to update my materialized view. This uses a CosmosDBTrigger which means this function will be triggered every time there is an update to documents in the collection we're monitoring. In our case, it's the orders collection, so any new orders will trigger this function. I also need to specify where the "lease" is stored that will keep track of how far through the updates we are. And I've set StartFromBeginning to true, although technically for a materialized view like our recommendations there is no need to do that if we don't mind starting with no recommendations.

I'm aso using a CosmosDB binding so that I can perform additional queries and updates to the products collection. Here's the function declaration:

[FunctionName("UpdateRecommendationView")]
public static async Task Run([CosmosDBTrigger(
        databaseName: databaseName,
        collectionName: Constants.OrdersContainerId,
        ConnectionStringSetting = connectionStringSettingName,
        LeaseCollectionName = "leases",
        StartFromBeginning = true,
        CreateLeaseCollectionIfNotExists = true)]
    IReadOnlyList<Document> input,

    [CosmosDB(
        databaseName: databaseName,
        collectionName: Constants.ProductsContainerId,
        ConnectionStringSetting = "OrdersDatabase")] DocumentClient productsClient,

    ILogger log)

Inside the function, the input parameter that the CosmosDBTrigger is bound to contains a list of updates. Each update includes the full document contents, so we can simply loop round and examine each order. In our case, for each order that is for more than one product we must update the product document for each product in that order to increment its count of purchases.

Obviously this could be quite slow, but that's one of the advantages of building these views asynchronously - it's happening in the background and not affecting the end user's experience. I could also rework this code slightly to take advantage of the batched nature of the updates, which would possibly reduce the number of document updates we need to make for each batch - currently we would update the same product more than once if it featured in multiple orders in the batch.

I did make one small performance enhancement, which is to ensure that the recommendation list size for each product doesn't grow above 50, just using a very simplistic technique. Obviously a real-world recommendations engine would be more sophisticated.

foreach (var doc in input)
{
    Order order = (dynamic) doc;
    log.LogInformation("Order: " + order.Id);

    if (order.OrderLines.Count <= 1)
    {
        // no related products to process
        continue;
    }
    foreach (var orderLine in order.OrderLines)
    {
        var docUri = UriFactory.CreateDocumentUri(databaseName, Constants.ProductsContainerId, orderLine.ProductId);
        var document = await productsClient.ReadDocumentAsync(docUri,
            new RequestOptions { PartitionKey = new PartitionKey("clothing") }); // partition key is "/Category" and all sample data is in the "clothing" category
        Product p = (dynamic) document.Resource;
        if (p.Recommendations == null)
        {
            p.Recommendations = new Dictionary<string, RelatedProduct>();
        }

        log.LogInformation("Product:" + orderLine.ProductId);
        foreach (var boughtWith in order.OrderLines.Where(l => l.ProductId != orderLine.ProductId))
        {
            if (p.Recommendations.ContainsKey(boughtWith.ProductId))
            {
                p.Recommendations[boughtWith.ProductId].Count++;
                p.Recommendations[boughtWith.ProductId].MostRecentPurchase = order.OrderDate;
            }
            else
            {
                p.Recommendations[boughtWith.ProductId] = new RelatedProduct()
                {
                    Count = 1,
                    MostRecentPurchase = order.OrderDate
                };
            }
        }
        if (p.Recommendations.Count > 50)
        {
            // simplistic way to keep recommendations list from getting too large
            log.LogInformation("Truncating recommendations");
            p.Recommendations = p.Recommendations
                .OrderBy(kvp => kvp.Value.Count)
                .ThenByDescending(kvp => kvp.Value.MostRecentPurchase)
                .Take(50)
                .ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
        }
        // update the product document with the new recommendations
        // (another optimization would be to not update if the recommendations hadn't actually changed)
        await productsClient.ReplaceDocumentAsync(document.Resource.SelfLink, p);
    }
}

Summary

The materialized view pattern is a very useful and powerful way to enable complex queries to be performed rapidly. The supporting views can be generated asynchronously, and serverless platforms like Azure Functions are a great fit for this. The CosmosDB change feed greatly simplifies the task of setting this up, and has built-in support in Azure Functions.

Despite not having used the CosmosDB change feed processor before I found it really quick and easy to get up and running with it. You can learn more about the Cosmos DB change feed here.

Want to learn more about how easy it is to get up and running with Azure Functions? Be sure to check out my Pluralsight courses Azure Functions Fundamentals and Microsoft Azure Developer: Create Serverless Functions

0 Comments Posted in:

In this post I want to describe a high-level strategy for migrating legacy ASP.NET MVC applications to ASP.NET Core. In one of the projects I work on there is a lot of code that was written in ASP.NET MVC, and uses Entity Framework.

Now, in one sense, there's nothing wrong with this - the "classic" ASP.NET MVC framework still works just fine. But there are a growing number of reasons to want to get away from the legacy platform and onto ASP.NET Core. Here's a few of my top reasons:

  • Classic ASP.NET runs on Windows only, limiting your options if you want to containerize your services
  • ASP.NET Core is much faster
  • Classic ASP.NET is now end-of-life, meaning it doesn't benefit from many other innovations, and blocks you from adopting several newer technologies (like Entity Framework Core or gRPC)

Complete rewrite?

Of course, for many developers, moving to a new technology stack presents an opportunity to completely rewrite your service. This allows you to clear off a bunch of technical debt in one go, and end up with a faster, more maintainable service with a better API design.

However, it's well known that software rewrites often tend to take a lot longer than expected to complete and can result in significant regressions. That's because legacy services tend to become bloated with too many responsibilities (something that a good microservices architecture hopefully avoids), and also because legacy services often don't come with a comprehensive suite of integration tests that could serve as an acceptance test for the rewritten service.

Keep as much as possible

The alternative is to keep as much of your existing code as possible. The main challenge we faced here was the fact that you're not just moving from ASP.NET MVC to ASP.NET Core MVC, but you also need to move from Entity Framework to Entity Framework Core at the same time.

However, the good news is that Entity Framework 6.4 bridges the gap between the worlds of legacy .NET Framework and .NET Core. It dual targets both .NET 4.5 and .NET Standard 2.1. This greatly reduces the amount of code that needs to change as part of the migration (although of course you may still want to upgrade to EF Core at a later date).

A recipe for migrating

Here's a simple process that we're following to slowly turn some legacy ASP.NET MVC services into .NET Core applications.

First, upgrade to use Entity Framework 6.4. This is relatively straightforward, and a lot less effort than rewriting to use EF Core which does have a different feature set.

Second, identify any NuGet dependencies that don't target .NET Standard, and find alternatives. We found that most of our dependencies that targeted .NET Framework could be upgraded to newer versions that supported .NET Standard.

But there were a few tricky things like libraries for PDF generation or image processing. These have potential to be blockers to getting to a point where you can run cross-platform, so you may need to consider breaking out any legacy code that absolutely cannot be ported into a separate microservice that still runs on .NET 472.

Of course, many of your dependencies may be your own shared libraries. We converted these to target .NET Standard 2.0, meaning that newer .NET Core services and legacy .NET Framework services could both take dependencies on them.

Third, move all business logic and database access code out into library classes that dual target .NET 472 and .NET Standard 2.1 (this is needed because of EF 6.4 which doesn't target .NET Standard 2.0). This requires using thin controllers rather than thick controllers, where the controller's only job is to call (maybe via a mediator) into a request handler and then convert its response into the appropriate HTTP response object.

Fourth, once you've reached this point, you can create a new ASP.NET Core 3.1 application, and reference all your business logic libraries. Then you need to recreate all your controllers and routes, as well as set up all your middleware (e.g. authentication, swagger) again. This part is a rewrite, but it's a much smaller scoped rewrite than would have been necessary if all the database access code had to change as well.

Summary

Thanks to EF 6.4 and the ability to dual target libraries, it's possible to migrate from ASP.NET MVC to ASP.NET Core MVC without having to completely rewrite all your database access code.

Let me know in the comments if you've done a similar thing or if you have any other tips for making the migration go smoothly.


0 Comments Posted in:

I'm pleased to announce a new Pluralsight course - "Microservices Architecture: Executive Briefing". This is a short (30 minute) high-level overview of what a microservices architecture is, why you might want to use it, and some pointers on how to be successful with it.

It was also my first foray into live video recording. Unlike most Pluralsight courses, this one is mostly a video of me speaking (apologies for making you look at my face!) It was certainly a steep learning curve. I had to buy a fair bit of equipment including a new camera (I went for the Canon EOS 250D) and some video lights, and after several failed attempts I finally managed to get the desired level of image quality. I'm certainly no expert, so I'm sure there's a lot of room for improvement but I'm pleased with how it turned out in the end.

This course is aimed at executives and decision makers rather than at programmers, so I've kept things quite high level and focus more on the why than the specifics of how. I have a couple of other course in the library (Microservices Fundamentals and Building Microservices) that are much developer-oriented if that's what you are looking for.

In the course I try to answer seven simple questions. First, what are microservices? This question can be answered purely in terms of what it means from a software architecture perspective (that microservices are small and autonomous), but it is also important to understand that microservices are a way to scale teams. This point was excellently made by Brendan Burns recently in his talk at the recent .NET Focus on Microservices conference. I highly recommend you take some time to watch some of the sessions from that event.

Second, what problems do microservices solve? "Monoliths" are not necessarily bad, and may actually be the right approach in some circumstances, but the most common problem you can run into with monoliths is the problem of scaling. Again, this is not just scaling from a software perspective, but scaling your team so that many developers can worth together efficiently on the same overall product.

Third, how can microservices help? By breaking our system into microservices, there is a lot more freedom to evolve and upgrade those services independently, and for each one to use the right tool for the job.

Fourth, what does a typical microservices architecture look like? In this section I used the same eShopOnContainers reference architecture that I used in my Microservices Fundamentals course, to show some common patterns such as backend for frontends, and using an event bus for communication between microservices.

Fifth, what are some challenges of microservices? It's really important to understand that microservices don't magically make all your problems go away. In fact, they introduce some significant technical and operational challenges. Deploying and monitoring many small services is more difficult than deploying and monitoring a single monolith. And it can be difficult for developers who need to start many services to work on the system. Thankfully we have seen an explosion of new technologies and tools over the past decade explicitly designed to address these challenges. Which brings us onto...

Sixth, what tools and technologies do I need? One great thing about microservices is that they give you complete freedom to use whatever tools and technologies you want. It can even be different for each microservices. For example, one microservice might use a document database while another uses a relational database. But I do think containerization has established itself as the most appropriate way to package and deploy your microservices, and so I recommend Docker and Kubernetes.

Finally, how can I successfully lead a microservices project? This is where my "imposter syndrome" kicks in a bit. I obviously can only speak from my experience of transitioning towards microservices and so my suggestions and advice might not necessarily be relevant in all contexts. But in this section I do highlight the importance of a change of mindset throughout the company - it's not only about a new way of architecting software, but new approaches to team structures, releasing new versions and new skills needed to monitor applications in production.

I hope that the brief answers to these questions I provide in this course prove helpful to anyone considering whether to adopt a microservices architecture. And I'd love to hear your good (or bad) stories of how embracing microservices went for you. Learning from other people's success or failure stories can save a lot of wasted time and effort.