I recently had to fix a bug due to incorrect use of
Stream.Read, and it struck me that quote often I see bugs and performance issues related to poorly written code reading from streams. This is often in code using NAudio, which has a stream-inspired API, but also streams are very commonly used in all kinds of .NET applications.
So in this post, I want to quickly suggest a few guidelines for better reading from streams in C#.
public abstract int Read(byte buffer, int offset, int count);
For reference, the
Stream.Read method shown above has been part of .NET since the very beginning. It is still very commonly used although there are some additional overloads and alternatives that we'll discuss shortly that are often better choices.
Don't ignore the return value
Read method returns an integer that indicates how many bytes were actually read from the stream. A common source of bugs is ignoring this return value and assuming it is the same as the
count parameter. However, as is documented, it is possible for
Read to return fewer bytes than were requested even if the end of the stream was not reached:
"An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached."
Now there is a new method called
ReadExactly which was introduced in .NET 7, which will read exactly the number of bytes you asked for and throw an exception if that is not possible. And although there are situations in which this might be useful, there are some other considerations before selecting it that we'll discuss next.
Avoid trying to read an entire stream in one call to Read
One thing I very commonly see in both NAudio and other C# code that uses streams is developers attempting to read the entire stream in a single call to
Read. This typically happens when we somehow already know how many bytes are (or should be) in the stream.
Asides from the bug mentioned earlier where you shouldn't necessarily expect
Read to return the whole stream even if you asked for it, reading an entire stream in one call to
Read often defeats the purpose of using streams in the first place, which is for efficiency purposes.
A stream allows us to read data in chunks, processing it bit by bit, removing the need to hold everything in memory and allowing us to stop processing early if it turns out we didn't need everything. By reading an entire stream into memory up front, you miss out on these performance advantages.
I should also mention, that in most cases its more appropriate to use the
ReadAsync overload. This is because behind the scenes with streams there is almost always some kind of network or disk IO going on behind the scenes. This makes more efficient use of threads as well as allows for cancellation.
Avoid unnecessary buffer allocation
Another common problem I often see with calls to
Stream.Read or related methods, is unnecessarily allocating a memory buffer for every call to read into. Not only is it usually possible to reuse the same buffer on each call to
Read, but there is now also support for
ReadAsync, giving you more flexibility over the memory you choose to use as a buffer (learn more about guidelines for these types here). You could even consider using
ArrayPool.Rent as an alternative to maintaining your own reusable memory buffer.
A stream-based programming model allows you to write efficient code to process large amounts of data, but only if you take care to do so. Hopefully this post has provided a few helpful pointers, and let me know in the comments if you have any additional recommendations of your own.