What types should I use to pass collections in C#?
In C# we work with collections of data all the time, and .NET provides several types for storing collections, such as Lists, Arrays, Dictionaries, and HashSets.
But one question that often comes up during code review, is "what types should we use to pass collections around in code"? When methods take collections as parameters, or return them, should we use concrete types like List<T>
, or maybe everything should be IEmumerable<T>
? And what about DTOs that get serialized into JSON or XML and back out again? How should we declare properties that contain collections on DTOs?
In this post we'll look at three scenarios: (1) passing collections into methods, (2) returning collections from methods (3) collection properties on DTOs.
Passing collections into methods
If your method needs to accept a collection as a parameter, then generally IEnumerable<T>
is the best choice. This offers the most flexibility to the caller. It doesn't matter what concrete collection type they use, and it even allows them to pass lazily evaluated sequences in.
For example, in the following code snippet, ProcessOrders2
is a better choice as it allows the caller to pass in a lazily evaluated IEnumerable
generated by the LINQ Where
method.
void ProcessOrders1(List<Order> orders)
{
// ...
}
void ProcessOrders2(IEnumerable<Order> orders)
{
// ...
}
void Caller()
{
ProcessOrders1(myOrders.Where(o => !o.Shipped)); // won't compile - requires us to add a ToList
ProcessOrders2(myOrders.Where(o => !o.Shipped)); // works
}
When might you use something other than IEnumerable<T>
? Sometimes I've seen code where a List<T>
is passed to a method because that method is going to add elements into the list. Generally I prefer to take a more functional programming approach where methods do not modify parameters they are passed. Methods that modify parameters makes it harder to reason about your code and test it, and introduces potential thread safety issues especially when you're dealing with async methods.
So avoid writing methods that look like this:
// avoid modifying collections passed into methods
Task AddCustomer(Guid id, List<Customers> customers) {
var customer = await customers.GetById(id);
customers.Add(customer);
}
Another common scenario is if your method needs to iterate through the collection more than once, or access the count of items in the collection it's been passed. If the parameter type is IEnumerable<T>
in this scenario, we can't know that it is safe to enumerate more than once - it could be an expensive operation that goes to a database, or it could even return different results the second time through. Here's a slightly contrived example of a method that enumerates the orders
parameter up to three times:
// avoid: multiple enumeration
void ProcessOrders(IEnumerable<Order> orders)
{
Console.WriteLine($"Processing {orders.Count()} orders");
var allValid = true;
foreach(var order in orders)
{
if (!IsValid(order))
{
Console.WriteLine($"Order {orders.Id} is invalid");
allValid = false;
}
}
if (allValid)
{
foreach(var order in orders)
{
PrintShippingNote(order);
}
}
}
One way we could solve this is if ProcessOrders
method simply performed a ToList
on orders
to get it in memory. That would allow it to enumerate the list multiple times. This approach is nicest from the caller's perspective: they can still provide an IEnumerable<T>
if they want.
But suppose ProcessOrders
and all its callers are under my control and I know I don't need the flexibility of passing an IEnumerable<T>
. In that case I might simply choose to declare parameter type as an IReadOnlyCollection<T>
instead. IReadOnlyCollection<T>
allows us to be sure that all items are already available in memory so we can safely enumerate multiple times, and also exposes a Count
property. So it's worth considering instead of IEnumerable<T>
if you find you're adding unnecessary calls to .ToList
on objects that are probably already lists in the first place.
In summary, my recommendations for passing collections into methods are:
- Use
IEnumerable<T>
where possible - Avoid passing concrete collection types
- Avoid modifying collections passed as parameters
- Consider
IReadOnlyCollection<T>
if you need to enumerate multiple times and your callers are easily able to provide an in-memory list
Returning collections from methods
Because IEnumerable<T>
is arguably the best type to use to pass a collection to a method, many developers assume that it is also the best type to return a collection from a method. So we might declare a method like this:
IEnumerable<Order> GetOrders(DateTime orderDate)
{
//
}
It's not a bad choice, but it does mean that the caller cannot make any assumptions about the collection they have been given. For example, they don't know if enumerating the return value will be an in-memory operation or a potentially expensive action. It could even throw an exception as they enumerate. They also don't know if they can safely enumerate it more than once. So often the caller ends up doing a .ToList
or similar on the collection you passed, which is wasteful if it was already a List<T>
already.
You end up seeing a lot of code like this:
var orders = GetOrders(orderDate).ToList(); // we want to multiply enumerate orders, so convert to List
This is actually another case in which IReadOnlyCollection<T>
can be a good fit if you know your method is always going to return an in-memory collection. It gives your caller the ability to access the count of items and iterate through as many times as they like, but doesn't allow them to modify that collection, which might be important if you have a method that returns a cached list like this:
private List<string> validFileExtensions = ...;
// ensure that callers aren't able to modify this list
public IReadOnlyCollection<string> GetValidFileExtensions()
{
return validFileExtensions;
}
So in summary, when a method returns a collection:
- Consider
IReadOnlyCollection<T>
if you always return an in memory list, and your callers would benefit from having the whole collection already in memory. - Use
IEnumerable<T>
where you want to offer callers the ability to partially enumerate through, and potentially generate items in the sequence on the fly.
Collections as properties on DTOs
What about when we declare data transfer objects (DTOs) that are going to be serialized to JSON perhaps as part of a web API request or a queue message? I've seen lots of approaches here. Here's some common examples:
class Example1
{
public IEnumerable<string> People { get; set; }
}
class Example2
{
public IReadOnlyCollection<string> People { get; set; }
}
class Example3
{
public ICollection<string> People { get; set; }
}
class Example4
{
public IList<string> People { get; set; }
}
In the above code samples, Newtonsoft.Json has no problem serializing and deserializing them. It actually deserializes them all to a List<T>
except for Example2
where it creates a ReadOnlyCollection
. Example1
and Example2
require us to set the entire list ready populated when we create an instance in code, while Example3
and Example4
let us add or remove elements from the People
collection after creating the DTO.
Personally, I would avoid IEnumerable<T>
(Example1
) as it seems unnecessarily restrictive given that we know all the items are available in memory for this type of object. I would prefer IReadOnlyCollection<T>
here (Example2
), allowing callers to access the Count
of items easily.
One nice thing about Newtonsoft.Json is that it can successfully deserialize instances where we don't even put a public setter on the collection property, like this:
class Example5
{
public Example5(IEnumerable<string> people)
{
People = people.ToList();
}
public IReadOnlyCollection<string> People { get; }
}
Personally I don't tend to bother with that as it's cumbersome to write on DTOs with many properties. Hopefully if C# 8 record types become a thing, it will be easy to declare an immutable DTO type that supports deserialization.
Summary
There are a huge variety of collection types in .NET and it's not always obvious what the most appropriate one to use when passing them around in method calls and DTOs. IEnumerable<T>
is a good fit for many scenarios, but do consider that IReadOnlyCollection<T>
might be a better fit in circumstances where the collection is always going to be fully available in memory. Avoid passing round mutable collection types as this can cause confusion about who owns the collection.
Of course, there is a lot more that could be said about this. I've not touched at all on the newer immutable collections, which are great for a more functional approach to collections, or on Span<T>
which is a great fit for high performance mutable collections of primitives. Do feel free to let me know in the comments what approach you take.
Comments
Whats an alternative to method modifying a collection? At some point you may need to add/remove a collection item. Should a method e.g. return a new collection where the new item was removed or added? So that we've kind of an immutable behaviour.
heinz huberThe important thing is that there should be clear ownership of a collection. If a class owns a collection that it adds and removes items from, and there's a method that returns that collection, I'd return it as a IReadOnlyCollection as callers should only get a view onto that collection rather than being allowed to modify it. I guess there might be valid situations in which a method creates and returns a brand new List<t> and then the caller might subsequently want to add and remove items from it, but often there is a better way to achieve that, so I'd try to avoid it.
Mark HeathI like the idea of using
aregazIReadOnlyCollection
(when it is reasonable).I also know that some developers prefer using arrays as return types over
IEnumerable<t>
andList<t>
, sinceIEnumerable<t>
has disadvantages you have described andList<t>
is anyway internally backed-up by an array (not sure about other collection types).I don't really understand why IReadOnlyCollection is "better" than IEnumerable in the case of returning collections from a method. What stops clients of my method from casting the IReadOnlyCollection object to a List and then modifying it directly (if my method makes use of a List of course)? Another question - how does IReadOnlyCollection stop me from returning collection that is not an in-memory collection?
mnjSure people can try casting the return values into whatever they like, but the return value of any method is essentially a contract telling the caller how they can use that value. If I return an IEnumerable, I'm saying that the caller can't safely multiply enumerate. So if I'm returning an already in-memory list, I prefer to make that obvious. In theory, IReadOnlyCollection could be implemented by something not returning an already in-memory collection, but since it has a Count property, it is used only when the exact number of items in the collection is known up front, meaning that it can be safely enumerated multiple times.
Mark Heath