Serverless Media Processing with Azure Container Instances

One of the recurring themes throughout my career is the need to perform media processing, often using a tool like FFMPEG, or something using my NAudio library.

Now, depending on your use case, it might be that a service like Azure Media Services already has the capability to perform the transcodes you require. But for the purposes of this post, let's assume that we need to use FFMPEG.

And let's add two more requirements. We want a serverless pricing model - that is we don't want to pay for compute that's sat idle, we want to pay only for what we use. And we want to be able to perform multiple transcodes in parallel - so if a batch of jobs come in simultaneously we don't have to wait for them to complete sequentially.

What are our options?

Since transcoding can be a time-consuming process, often the best approach is to make it an asynchronous operation. Put each transcode job in a queue, and then have a listener taking messages off the queue and performing the transcode. This way we can add additional queue listeners to work through a backlog quickly.

If we take this approach, then we have multiple options for how to host the queue listening process.

We could use Virtual Machine Scale Sets. Here we have a pool of virtual machines that can automatically scale based on how big the queue backlog is. I've used this approach successfully for media processing tasks, but it does have some limitations. The auto-scaling logic is quite primitive and its cumbersome to write your own. And depending on what your startup scripts involve, it can be painfully slow for a new VM to come online and start processing work.

Alternatively, Azure Batch would be a good fit for this scenario. It's specifically designed for working through queues of jobs. It's a service I've not yet had a chance to try out, but what has put me off so far is that it seems quite a cumbersome programming model. I'm looking for something a bit more lightweight.

You might think that given my enthusiasm for Azure Functions I'd be recommending that as a serverless approach. It certainly is possible to call FFMPEG in an Azure Function (I've got some proofs of concept that I hope to share on this blog in the near future). But there are a few limitations of Azure Functions that make it less suitable for this kind of task. First of all, the consumption plan on Azure Functions is not intended for long-running tasks (which transcodes often are), and by default times you out after 5 minutes. You can of course switch to an App Service plan, but then you're not paying a serverless pricing model. And secondly, many FFMPEG tasks work best on local files, and you can't mount an Azure File Share to a Function App, so you'd need to download the input file to the local disk, transcode, and then copy the resulting file back out afterwards.

Serverless transcode with Azure Container Instances

So how can Azure Container Instances help us create a serverless transcoding engine? They have a few features that make them well suited to this task:

They are based on containers - making it very easy to package the transcode tool with its dependencies
You can attach Azure File Shares, which is great for situations where the transcode task doesn't work with a URI input
They have a serverless pricing model - you pay only for how many seconds the container is running for
They are fast to start up. It's not instantaneous, but a lot faster than waiting for a VM to spin up.
They don't limit the duration they run for. Long transcodes are fine.
You can specify the CPU and RAM requirements. Some codecs go faster with multiple cores, while others don't benefit from additional cores, so you could customize the number of cores depending on the file type.

Are there any down-sides? Well if left to run 24/7, Azure Container Instances are a couple of times more expensive than the equivalent VM. So if you have a consistently high load then other approaches are cheaper.

Another key feature missing from Azure Container Instances is that there is no Event Grid notification to tell you when your container completes. So you'd need to implement your own notification or polling mechanism to determine that the transcode was complete.

A worked example

Let's see what it takes to implement a serverless media processing engine based on Azure Container Instances. We'll create an Azure File Share, upload a video to it, and then use a basic FFMPEG docker image to extract a thumbnail image from the video and save it into the file share. I'm using PowerShell and the Azure CLI for my example scripts.

First we'll create a resource group to work in:

$resourceGroup = "AciTranscodeDemo"
$location = "westeurope"
az group create -n $resourceGroup -l $location

And now let's create a storage account (with a random name):

$storageAccountName = "acishare$(Get-Random -Minimum 1000 -Maximum 10000)"
az storage account create -g $resourceGroup -n $storageAccountName `
    --sku Standard_LRS

We need to create an environment variable containing the storage account connection string to simplify the az storage share commmands:

$storageConnectionString = az storage account show-connection-string -n $storageAccountName -g $resourceGroup --query connectionString -o tsv
$env:AZURE_STORAGE_CONNECTION_STRING = $storageConnectionString

Now we can create our file share:

$shareName="acishare"
az storage share create -n $shareName

And upload a test video file into that share:

$filename = "intro.mp4"
$localFile = "C:\Users\markh\Pictures\$filename"
az storage file upload -s $shareName --source "$localFile"

That's all the setup we need. Now we're ready to create an Azure Container Instance that will perform the thumbnail image extraction. We will use az container create with a few key options:

We specify the Docker --image to use. In our case it's just a standard FFMPEG container. We don't need our own custom image as we'll pass in the command line
We set the --restart-policy to never. Whether it succeeds or fails, we don't want to restart once the transcode finishes
We use the --azure-file-volume-* arguments to specify the details of the file share to mount as a volume.
We use --command-line to specify the command line arguments for ffmpeg. We're pointing to the file in our mounted share as well as setting the output to write into that share.

$storageKey=$(az storage account keys list -g $resourceGroup --account-name $storageAccountName --query "[0].value" --output tsv)
$containerGroupName = "transcode"
az container create `
    -g $resourceGroup `
    -n $containerGroupName `
    --image jrottenberg/ffmpeg `
    --restart-policy never `
    --azure-file-volume-account-name $storageAccountName `
    --azure-file-volume-account-key $storageKey `
    --azure-file-volume-share-name $shareName `
    --azure-file-volume-mount-path "/mnt/azfile" `
    --command-line "ffmpeg -i /mnt/azfile/$filename -vf  ""thumbnail,scale=640:360"" -frames:v 1 /mnt/azfile/thumb.png"

We can query for the current state of the container group, as well as look at the logs:

az container show -g $resourceGroup -n $containerGroupName
az container logs -g $resourceGroup -n $containerGroupName

Assuming all worked well, there will be a new thumb.png file in our Azure File Share. You can download it with the Azure CLI or look at it in Azure Storage Explorer. On the test video I used, which was 22 minutes long, my container only ran for 2 seconds to perform this particular task so it barely cost me anything.

A hybrid approach

Would I actually implement a serverless media processing platform using this approach? Well, I'd probably not completely forego queues and just start ACI containers for every single transcode task as the service quotas and limits for ACI are not particularly generous. You can't create hundreds of them every second.

So at the very least I'd probably implement a queue which the container checked when it started up. It could then perform repeated transcodes until the queue was empty, before exiting. This would increase efficiency as we'd not need to pay the startup time for the container for every transcode.

It would also allow a hybrid approach where some more cost-effective Virtual Machines could be spun up when load is very high to cheaply burn through heavy workloads, but the speed and agility of ACI could be used to handle sudden peaks in load. This is very similar to what the ACI Virtual Kubelet offers for Kubernetes - an elastic pool of virtual nodes that can handle bursts of work, while regular Virtual Machines handle the steady day-to-day load.

Summary

There are loads of ways you could implement a media processing engine in Azure, but if a serverless model appeals to you, ACI is a great way of achieving that, and could be used in a hybrid approach to get the best of both worlds with regards to cost-effectiveness and scalability.