Testable Code with Pure Functions

One of the great benefits of writing code in a functional style using “pure functions” is that it makes your code much easier to test. In case you’ve forgotten, pure functions are functions that have no “side effects”. Their output depends only on their input. So if you call a pure function twice with the same parameters it’s guaranteed return the same answer.

But how do you adjust your programming style to make use of pure functions? Well, I recently had to write a program that copied files from one folder to another. Only, it needed to reorganize them based on their filename. Files that started similarly needed to be grouped together into a folder. There’s probably an awesome algorithm I could have used, but my code ended up getting a little bit complicated and I knew I needed a good suite of unit tests to make sure all the requirements were met.

Now, in my early days of C# programming, I’d probably have tackled that problem something like this:

foreach (var inputFile in Directory.GetFiles(inputFolder, "*.*"))
{
    var outputFolder = // complicated code to work out where to put this
    if (!Directory.Exists(outputFolder))
        Directory.CreateDirectory(outputFolder);
    var outputFile = Path.Combine(outputFolder, Path.GetFileName(inputFile));    
    File.Copy(inputFile, outputFile);
}

What we have here is code that can’t easily be unit tested. We need an actual folder of input files somewhere on disk, and when we run the function, we need to verify it by checking the contents of the output folder.

Now if you’ve used a mocking framework, then your first instinct might be to create an interface to abstract away the file system access, which will allow us to mock out everything with side effects:

interface IFileSystem
{
    IEnumerable<string> GetFiles(string folder);
    bool DirectoryExists(string folder);
    void CreateDirectory();
    void Copy(string source, string destination);
}

Now we can update our method and make it testable like so:

private void SortFiles(string inputFolder, IFileSystem fileSystem)
{
    foreach (var inputFile in fileSystem.GetFiles(inputFolder))
    {
        var outputFolder = // code to work out where to put this
        if (!fileSystem.DirectoryExists(outputFolder))
            fileSystem.CreateDirectory(outputFolder);
        var outputFile = Path.Combine(outputFolder, Path.GetFileName(inputFile));
        fileSystem.Copy(inputFile, outputFile);
    }
}

And this works. Now we can mock IFileSystem and unit test to our hearts content (note I didn’t include Path.Combine and Path.GetFileName in my IFileSystem interface as these don’t have side effects and should behave as pure functions). However, this kind of code is a pain to write and maintain. Not only have we had to create an interface and concrete implementation, but our tests get bloated with a load of mocking object configurations which are hard to read at the best of times.

But what if we thought about this problem differently? What if instead of actually interacting with the disk at all, we changed our function to take an IEnumerable of input file paths and return an IEnumerable sequence of copies that need to be made (here I’ve used a tuple of source and destination but I’m still waiting for C# to make tuples nicer to work with!). Now our method would look like this

private IEnumerable<Tuple<string,string>> SortFilesPure(IEnumerable<string> inputFiles)
{
    foreach (var inputFile in inputFiles)
    {
        var outputFolder = // code to work out where to put this
        var outputFile = Path.Combine(outputFolder, Path.GetFileName(inputFile));
        yield return Tuple.Create(inputFile, outputFile);
    }
}

Now suddenly this becomes trivial to test. We just pass in a list of input file paths, and check the paths where it thinks they should be copied to. (By the way, the reason I’ve made my method testable at this level rather than on a per file basis is that my algorithm was multi-pass and needed to examine all the input filenames before it could work out where to copy one).

And there will of course be a non-pure method that simply loops through each of these “copies” and applies them by calling File.Copy. But what we’ve done is moved all the business logic – in our case the algorithm that works out where a file should be copied to – into a testable pure function. The remaining code is simply side effects and can be proved out very easily by running it for real.

Hopefully this gives you a simple idea of how you can make use of pure functions to separate out the logic from the side-effect producing parts of your code, and make your tests a lot simpler.