0 Comments Posted in:

In C# we work with collections of data all the time, and .NET provides several types for storing collections, such as Lists, Arrays, Dictionaries, and HashSets.

But one question that often comes up during code review, is "what types should we use to pass collections around in code"? When methods take collections as parameters, or return them, should we use concrete types like List<T>, or maybe everything should be IEmumerable<T>? And what about DTOs that get serialized into JSON or XML and back out again? How should we declare properties that contain collections on DTOs?

In this post we'll look at three scenarios: (1) passing collections into methods, (2) returning collections from methods (3) collection properties on DTOs.

Passing collections into methods

If your method needs to accept a collection as a parameter, then generally IEnumerable<T> is the best choice. This offers the most flexibility to the caller. It doesn't matter what concrete collection type they use, and it even allows them to pass lazily evaluated sequences in.

For example, in the following code snippet, ProcessOrders2 is a better choice as it allows the caller to pass in a lazily evaluated IEnumerable generated by the LINQ Where method.

void ProcessOrders1(List<Order> orders)
{
    // ...
}
void ProcessOrders2(IEnumerable<Order> orders)
{
    // ...
}

void Caller() 
{
    ProcessOrders1(myOrders.Where(o => !o.Shipped)); // won't compile - requires us to add a ToList
    ProcessOrders2(myOrders.Where(o => !o.Shipped)); // works
}

When might you use something other than IEnumerable<T>? Sometimes I've seen code where a List<T> is passed to a method because that method is going to add elements into the list. Generally I prefer to take a more functional programming approach where methods do not modify parameters they are passed. Methods that modify parameters makes it harder to reason about your code and test it, and introduces potential thread safety issues especially when you're dealing with async methods.

So avoid writing methods that look like this:

// avoid modifying collections passed into methods
Task AddCustomer(Guid id, List<Customers> customers) {
    var customer = await customers.GetById(id);
    customers.Add(customer);
}

Another common scenario is if your method needs to iterate through the collection more than once, or access the count of items in the collection it's been passed. If the parameter type is IEnumerable<T> in this scenario, we can't know that it is safe to enumerate more than once - it could be an expensive operation that goes to a database, or it could even return different results the second time through. Here's a slightly contrived example of a method that enumerates the orders parameter up to three times:

// avoid: multiple enumeration
void ProcessOrders(IEnumerable<Order> orders)
{
    Console.WriteLine($"Processing {orders.Count()} orders");
    var allValid = true;
    foreach(var order in orders)
    {
        if (!IsValid(order))
        {
            Console.WriteLine($"Order {orders.Id} is invalid");
            allValid = false;
        }
    }
    if (allValid)
    {
        foreach(var order in orders)
        {
            PrintShippingNote(order);
        }
    }
}

One way we could solve this is if ProcessOrders method simply performed a ToList on orders to get it in memory. That would allow it to enumerate the list multiple times. This approach is nicest from the caller's perspective: they can still provide an IEnumerable<T> if they want.

But suppose ProcessOrders and all its callers are under my control and I know I don't need the flexibility of passing an IEnumerable<T>. In that case I might simply choose to declare parameter type as an IReadOnlyCollection<T> instead. IReadOnlyCollection<T> allows us to be sure that all items are already available in memory so we can safely enumerate multiple times, and also exposes a Count property. So it's worth considering instead of IEnumerable<T> if you find you're adding unnecessary calls to .ToList on objects that are probably already lists in the first place.

In summary, my recommendations for passing collections into methods are:

  • Use IEnumerable<T> where possible
  • Avoid passing concrete collection types
  • Avoid modifying collections passed as parameters
  • Consider IReadOnlyCollection<T> if you need to enumerate multiple times and your callers are easily able to provide an in-memory list

Returning collections from methods

Because IEnumerable<T> is arguably the best type to use to pass a collection to a method, many developers assume that it is also the best type to return a collection from a method. So we might declare a method like this:

IEnumerable<Order> GetOrders(DateTime orderDate)
{
    // 
}

It's not a bad choice, but it does mean that the caller cannot make any assumptions about the collection they have been given. For example, they don't know if enumerating the return value will be an in-memory operation or a potentially expensive action. It could even throw an exception as they enumerate. They also don't know if they can safely enumerate it more than once. So often the caller ends up doing a .ToList or similar on the collection you passed, which is wasteful if it was already a List<T> already.

You end up seeing a lot of code like this:

var orders = GetOrders(orderDate).ToList(); // we want to multiply enumerate orders, so convert to List

This is actually another case in which IReadOnlyCollection<T> can be a good fit if you know your method is always going to return an in-memory collection. It gives your caller the ability to access the count of items and iterate through as many times as they like, but doesn't allow them to modify that collection, which might be important if you have a method that returns a cached list like this:

private List<string> validFileExtensions = ...;
// ensure that callers aren't able to modify this list
public IReadOnlyCollection<string> GetValidFileExtensions()
{
    return validFileExtensions;
}

So in summary, when a method returns a collection:

  • Consider IReadOnlyCollection<T> if you always return an in memory list, and your callers would benefit from having the whole collection already in memory.
  • Use IEnumerable<T> where you want to offer callers the ability to partially enumerate through, and potentially generate items in the sequence on the fly.

Collections as properties on DTOs

What about when we declare data transfer objects (DTOs) that are going to be serialized to JSON perhaps as part of a web API request or a queue message? I've seen lots of approaches here. Here's some common examples:

class Example1
{
    public IEnumerable<string> People { get; set; }
}

class Example2
{
    public IReadOnlyCollection<string> People { get; set; }
}

class Example3
{
    public ICollection<string> People { get; set; }
}

class Example4
{
    public IList<string> People { get; set; }
}

In the above code samples, Newtonsoft.Json has no problem serializing and deserializing them. It actually deserializes them all to a List<T> except for Example2 where it creates a ReadOnlyCollection. Example1 and Example2 require us to set the entire list ready populated when we create an instance in code, while Example3 and Example4 let us add or remove elements from the People collection after creating the DTO.

Personally, I would avoid IEnumerable<T> (Example1) as it seems unnecessarily restrictive given that we know all the items are available in memory for this type of object. I would prefer IReadOnlyCollection<T> here (Example2), allowing callers to access the Count of items easily.

One nice thing about Newtonsoft.Json is that it can successfully deserialize instances where we don't even put a public setter on the collection property, like this:

class Example5
{
    public Example5(IEnumerable<string> people)
    {
        People = people.ToList();
    }
    public IReadOnlyCollection<string> People { get; }
}

Personally I don't tend to bother with that as it's cumbersome to write on DTOs with many properties. Hopefully if C# 8 record types become a thing, it will be easy to declare an immutable DTO type that supports deserialization.

Summary

There are a huge variety of collection types in .NET and it's not always obvious what the most appropriate one to use when passing them around in method calls and DTOs. IEnumerable<T> is a good fit for many scenarios, but do consider that IReadOnlyCollection<T> might be a better fit in circumstances where the collection is always going to be fully available in memory. Avoid passing round mutable collection types as this can cause confusion about who owns the collection.

Of course, there is a lot more that could be said about this. I've not touched at all on the newer immutable collections, which are great for a more functional approach to collections, or on Span<T> which is a great fit for high performance mutable collections of primitives. Do feel free to let me know in the comments what approach you take.


0 Comments Posted in:

The cold start problem

Whenever I talk about Azure Functions, the subject of "cold start" invariably causes concern. A detailed overview of cold starts in Azure Functions is available here, but the simple explanation is that the Azure Functions consumption plan adds and removes instances of the functions host dynamically, which means that when your function is triggered, there might not currently be an instance available to handle it immediately. If that's the case, a new instance of the functions host is started on demand, resulting in a brief delay before it handles it's first request - this is called a "cold start". If your function app has been idle for more than about 20 minutes, then you will likely experience a cold start next time a function is triggered.

How long does a typical cold start take? In the early days of Azure Functions it could be quite painful - I often saw waits in the 20 - 30 seconds range. Things have got a lot better since, and my friend Mikhail Shilkov has done a brilliant job benchmarking cold start times in Azure Functions in various configurations. For C# functions, it seems cold starts are typically in the range of 2-3 seconds, although sometimes can be as high as 10 seconds.

Does it even matter?

It's worth pausing to ask whether cold starts really matter for your application. After all, for many trigger types such as new messages on a queue, scheduled tasks, or new blobs appearing in blob storage, a human isn't sat there waiting for a web-page to load, and so the occasional added latency of a cold start might not be an issue.

Even your HTTP triggered functions, are not necessarily being called by a human. For example, if you're implementing a web-hook, the cold start time might not matter too much.

Obviously if your functions are implementing APIs that are called from web-pages, then a cold start potentially introduces a poor user experience, which might be an issue for you. Of course, the cold start time is only part of the bigger picture of responsiveness - if your function code is slow, or it has downstream dependencies on slow external systems, then eliminating the cold start time will only go part-way to addressing your performance issues.

But if cold starts are a problem, what can be done about them? Mikhail has provided a few useful suggestions for reducing cold start times on his blog. He shows that deployment and logging techniques can affect cold start time. So there are ways to reduce the cold start impact.

But in the rest of this article, I want to highlight a few other approaches you could consider if cold starts really do pose a problem for you. Can we avoid them altogether?

Workaround #1 - Warmup request

I've heard of a few people using a timer triggered function to keep their Function App constantly warm. This feels a bit hacky to me, as it essentially exploits a loophole in the consumption pricing plan. I've not tried it myself, but I see no reason why it wouldn't work. You'd need to run at least every 20 minutes (probably 15 to be on the safe side), which would require 2880 invocations per month, which would have minimal cost.

A more elegant variation on this theme would be to try to warm up your Function App just in time. Maybe you know that at a certain time in the morning the Function App is likely to be cold and so you wake it up just before you expect users to come online. Or maybe when a user logs into your system, or visits a certain webpage, you know that a function is likely to be triggered soon. In that case you could send a simple "warmup" request in advance of the real one. Here's an example of this technique in action.

Workaround #2 - App Service Plan

Many people are not aware that with Azure Functions, you don't have to host using the serverless "consumption" plan. If you prefer, you can just use a regular Azure App Service Plan, which comes with a fixed monthly fee per server instance, and use that to run your Function Apps.

With this option, you lose the benefits of per-second billing, and you also lose the rapid elastic scale (although an App Service Plan can be configured to scale out based on CPU or on a schedule).

However, you no longer need to worry about cold starts - your dedicated compute is always available. You also get the benefit that the 5 minute function duration limitation no longer applies.

A variation on this theme would be to take advantage of the fact that the Azure Functions runtime can be hosted in a Docker container. So you could host that on a VM running Docker, or several instances on a Kubernetes cluster if you wanted. You'd have to implement any autoscaler logic yourself at the moment if you required automatic scale out though.

Workaround #3 - Premium plan

Finally, what if we could have the best of both worlds? Imagine we could have some dedicated instances that were always on, to eliminate cold starts, but could still elastically scale out beyond that in the same way that the consumption plan does.

Well, in the Feb 2019 Azure Functions live stream, Jeff Hollan announced the preview of the Azure Functions premium plan, which will offer this kind of hybrid approach to scale out.

Essentially, you'll have at least 2 always on worker instances, but above that, it scales out dynamically. This plan is still in preview, but it offers a nice upgrade path from the consumption plan for the future if you do need to avoid cold starts.

Summary

"Cold starts" are an inevitable consequence of the dynamic nature of the consumption hosting plan. They are not necessarily an issue for all applications or trigger types, so its worth thinking about how important it really is to avoid them. In this article I've presented a few ways you can go about mitigating or avoiding the cold start problem. And hopefully over time we'll continue to see performance improvements to Azure Functions cold start times.

For more reading on the topic of cold starts and scaling in Azure Functions, I highly recommend you checking out some of the following articles:

Want to learn more about how easy it is to get up and running with Azure Functions? Be sure to check out my Pluralsight courses Azure Functions Fundamentals and Microsoft Azure Developer: Create Serverless Functions

0 Comments Posted in:

ACI - great for short-lived workloads

Azure Container Instances is a great service that combines the benefits of containers and "serverless". They make it really simple to run a container in the cloud without needing to pre-provision any servers at all. And the billing model is per-second - only pay while your containers are actually running, which can result in dramatic cost-savings.

This makes them a great choice for short-lived or "bursty" workloads. There's no need to pay for a VM sitting idle most of the time when it's this quick and easy to spin up a container to do some work and then exit. In my Azure Container Instances: Getting Started course on Pluralsight, I provided some calculations to show when you would save money with ACI containers versus an always-on VM.

Up until recently, my recommendation was to only consider ACI if you were sure that at least half the time you didn't need the container running. And that was because the original pricing of ACI worked out to about double the cost of a VM. There were savings to be had, but only when you were dealing with occasional workloads.

Reduced ACI prices

However, recently the ACI pricing was reduced which makes them a much more attractive proposition for situations where you don't need compute all the time, but you do need it most of the time. Let's do some quick calculations with the handy Azure pricing calculator.

For comparison purposes I've chosen to look at US dollar pricing in East US, but it should be roughly similar whatever your region.

If we price up a Linux ACI instance with 1 vCPU and 1GB of RAM that costs $32.81 per month. If we switch up to 2 vCPU's and 8GB RAM, then it's $85.12 per month. If we compare that to running a D2 v3 Virtual Machine (with 2 vCPUs and 8GB RAM) that costs $70.08 per month. So although the VM is still cheaper in an always-on scenario, the difference is much closer (ACI costs about 20% more in this example).

ACI Linux

With this new pricing, your container could run 20 hours a day before it matched the cost of an always-on VM.

Windows costs more

It's worth pointing out that there's an additional charge for Windows (an extra $63 in this example). This means the same size Windows ACI container costs $148.19 to run for a month, compared with $137.24 to run Windows on the equivalently sized D2 v3 VM. The ACI markup is smaller in this example - only about 10% more.

Windows

What should I use?

VM's still win in always-on scenarios, so if you're running a web-site that needs to be available 24/7, ACI is not the best choice. But in almost all other cases, if your workload doesn't need to run all day every day, ACI has potential to save you some money.

One advantage VM's do still have up their sleeve is the substantial "reserved instance" pricing discounts. If you have those available, then it may be cheaper to run VMs even if they sit idle for half the time.

There are a few more considerations when picking between VMs and ACI. If your usage patterns are very predictable - e.g. you only run during working hours on weekdays, then you can automate turning VMs on and off (make sure it's stopped deallocated) as a way of cutting costs without needing to use ACI. But booting VMs is typically a lot slower than starting a container, so ACI will be more responsive when load is unpredictable and you want to start up new instances quickly. ACI also likely will have lower memory requirements - there isn't as much OS overhead, meaning that an 8GB VM might be actually more comparable to a 7GB container instance, making the pricing comparison even closer.

In short, you should do your own calculations to make the decision whether you'd get the cheapest compute with VMs or ACI, but the reduced ACI pricing makes them a much more attractive option in many scenarios. It also makes the AKS Virtual Kubelet and AKS Virtual Nodes a much more compelling option, which use the elastic scale of ACI to rapidly scale applications on AKS.

Want to learn more about how easy it is to get up and running with Azure Container Instances? Be sure to check out my Pluralsight course Azure Container Instances: Getting Started.