0 Comments Posted in:

One of the recurring themes throughout my career is the need to perform media processing, often using a tool like FFMPEG, or something using my NAudio library.

Now, depending on your use case, it might be that a service like Azure Media Services already has the capability to perform the transcodes you require. But for the purposes of this post, let's assume that we need to use FFMPEG.

And let's add two more requirements. We want a serverless pricing model - that is we don't want to pay for compute that's sat idle, we want to pay only for what we use. And we want to be able to perform multiple transcodes in parallel - so if a batch of jobs come in simultaneously we don't have to wait for them to complete sequentially.

What are our options?

Since transcoding can be a time-consuming process, often the best approach is to make it an asynchronous operation. Put each transcode job in a queue, and then have a listener taking messages off the queue and performing the transcode. This way we can add additional queue listeners to work through a backlog quickly.

If we take this approach, then we have multiple options for how to host the queue listening process.

We could use Virtual Machine Scale Sets. Here we have a pool of virtual machines that can automatically scale based on how big the queue backlog is. I've used this approach successfully for media processing tasks, but it does have some limitations. The auto-scaling logic is quite primitive and its cumbersome to write your own. And depending on what your startup scripts involve, it can be painfully slow for a new VM to come online and start processing work.

Alternatively, Azure Batch would be a good fit for this scenario. It's specifically designed for working through queues of jobs. It's a service I've not yet had a chance to try out, but what has put me off so far is that it seems quite a cumbersome programming model. I'm looking for something a bit more lightweight.

You might think that given my enthusiasm for Azure Functions I'd be recommending that as a serverless approach. It certainly is possible to call FFMPEG in an Azure Function (I've got some proofs of concept that I hope to share on this blog in the near future). But there are a few limitations of Azure Functions that make it less suitable for this kind of task. First of all, the consumption plan on Azure Functions is not intended for long-running tasks (which transcodes often are), and by default times you out after 5 minutes. You can of course switch to an App Service plan, but then you're not paying a serverless pricing model. And secondly, many FFMPEG tasks work best on local files, and you can't mount an Azure File Share to a Function App, so you'd need to download the input file to the local disk, transcode, and then copy the resulting file back out afterwards.

Serverless transcode with Azure Container Instances

So how can Azure Container Instances help us create a serverless transcoding engine? They have a few features that make them well suited to this task:

  • They are based on containers - making it very easy to package the transcode tool with its dependencies
  • You can attach Azure File Shares, which is great for situations where the transcode task doesn't work with a URI input
  • They have a serverless pricing model - you pay only for how many seconds the container is running for
  • They are fast to start up. It's not instantaneous, but a lot faster than waiting for a VM to spin up.
  • They don't limit the duration they run for. Long transcodes are fine.
  • You can specify the CPU and RAM requirements. Some codecs go faster with multiple cores, while others don't benefit from additional cores, so you could customize the number of cores depending on the file type.

Are there any down-sides? Well if left to run 24/7, Azure Container Instances are a couple of times more expensive than the equivalent VM. So if you have a consistently high load then other approaches are cheaper.

Another key feature missing from Azure Container Instances is that there is no Event Grid notification to tell you when your container completes. So you'd need to implement your own notification or polling mechanism to determine that the transcode was complete.

A worked example

Let's see what it takes to implement a serverless media processing engine based on Azure Container Instances. We'll create an Azure File Share, upload a video to it, and then use a basic FFMPEG docker image to extract a thumbnail image from the video and save it into the file share. I'm using PowerShell and the Azure CLI for my example scripts.

First we'll create a resource group to work in:

$resourceGroup = "AciTranscodeDemo"
$location = "westeurope"
az group create -n $resourceGroup -l $location

And now let's create a storage account (with a random name):

$storageAccountName = "acishare$(Get-Random -Minimum 1000 -Maximum 10000)"
az storage account create -g $resourceGroup -n $storageAccountName `
    --sku Standard_LRS

We need to create an environment variable containing the storage account connection string to simplify the az storage share commmands:

$storageConnectionString = az storage account show-connection-string -n $storageAccountName -g $resourceGroup --query connectionString -o tsv
$env:AZURE_STORAGE_CONNECTION_STRING = $storageConnectionString

Now we can create our file share:

az storage share create -n $shareName

And upload a test video file into that share:

$filename = "intro.mp4"
$localFile = "C:\Users\markh\Pictures\$filename"
az storage file upload -s $shareName --source "$localFile"

That's all the setup we need. Now we're ready to create an Azure Container Instance that will perform the thumbnail image extraction. We will use az container create with a few key options:

  • We specify the Docker --image to use. In our case it's just a standard FFMPEG container. We don't need our own custom image as we'll pass in the command line
  • We set the --restart-policy to never. Whether it succeeds or fails, we don't want to restart once the transcode finishes
  • We use the --azure-file-volume-* arguments to specify the details of the file share to mount as a volume.
  • We use --command-line to specify the command line arguments for ffmpeg. We're pointing to the file in our mounted share as well as setting the output to write into that share.
$storageKey=$(az storage account keys list -g $resourceGroup --account-name $storageAccountName --query "[0].value" --output tsv)
$containerGroupName = "transcode"
az container create `
    -g $resourceGroup `
    -n $containerGroupName `
    --image jrottenberg/ffmpeg `
    --restart-policy never `
    --azure-file-volume-account-name $storageAccountName `
    --azure-file-volume-account-key $storageKey `
    --azure-file-volume-share-name $shareName `
    --azure-file-volume-mount-path "/mnt/azfile" `
    --command-line "ffmpeg -i /mnt/azfile/$filename -vf  ""thumbnail,scale=640:360"" -frames:v 1 /mnt/azfile/thumb.png"

We can query for the current state of the container group, as well as look at the logs:

az container show -g $resourceGroup -n $containerGroupName
az container logs -g $resourceGroup -n $containerGroupName 

Assuming all worked well, there will be a new thumb.png file in our Azure File Share. You can download it with the Azure CLI or look at it in Azure Storage Explorer. On the test video I used, which was 22 minutes long, my container only ran for 2 seconds to perform this particular task so it barely cost me anything.

A hybrid approach

Would I actually implement a serverless media processing platform using this approach? Well, I'd probably not completely forego queues and just start ACI containers for every single transcode task as the service quotas and limits for ACI are not particularly generous. You can't create hundreds of them every second.

So at the very least I'd probably implement a queue which the container checked when it started up. It could then perform repeated transcodes until the queue was empty, before exiting. This would increase efficiency as we'd not need to pay the startup time for the container for every transcode.

It would also allow a hybrid approach where some more cost-effective Virtual Machines could be spun up when load is very high to cheaply burn through heavy workloads, but the speed and agility of ACI could be used to handle sudden peaks in load. This is very similar to what the ACI Virtual Kubelet offers for Kubernetes - an elastic pool of virtual nodes that can handle bursts of work, while regular Virtual Machines handle the steady day-to-day load.


There are loads of ways you could implement a media processing engine in Azure, but if a serverless model appeals to you, ACI is a great way of achieving that, and could be used in a hybrid approach to get the best of both worlds with regards to cost-effectiveness and scalability.

Want to learn more about how to build serverless applications in Azure? Be sure to check out my Pluralsight course Building Serverless Applications in Azure.

0 Comments Posted in:

Suppose in C# we have a number of tasks to perform that we're currently doing sequentially, but would like to speed up by running them in parallel. As a trivial example, imagine we're downloading a bunch of web pages like this:

var urls = new [] { 
var client = new HttpClient();
foreach(var url in urls)
    var html = await client.GetStringAsync(url);
    Console.WriteLine($"retrieved {html.Length} characters from {url}");

To parallelize this, we could just turn every single download into a separate Task with Task.Run and wait for them all to complete, but what if we wanted to limit the number of concurrent downloads? Let's say we only want 4 downloads to happen at a time.

In this trivial example, the exact number might not matter too much, but it's not hard to imagine a situation in which you would want to avoid too many concurrent calls to a downstream service.

In this post I'll look at four different ways of solving this problem.

Technique 1 - ConcurrentQueue

The first technique has been my go-to approach for many years. The basic idea is to put the work onto a queue and then have multiple threads reading off that queue. This is a nice simple approach but it does require us to remember to lock the queue as it will be accessed by multiple threads. In this example I'm using ConcurrentQueue to give us thread safety.

We fill the queue with all the urls to download, and then start one Task for each thread that simply sits in a loop trying to read from the queue, and exits when there are no more items left in the queue. We put each of these queue reader tasks in a list and then use Task.WhenAll to wait for them all to exit, which will happen once the final download has completed.

var maxThreads = 4;
var q = new ConcurrentQueue<string>(urls);
var tasks = new List<Task>();
for(int n = 0; n < maxThreads; n++)
    tasks.Add(Task.Run(async () => {
        while(q.TryDequeue(out string url)) 
            var html = await client.GetStringAsync(url);
            Console.WriteLine($"retrieved {html.Length} characters from {url}");
await Task.WhenAll(tasks);

I still like this approach as its conceptually simple. But it can be a bit of a pain if we are still generating more work to do while we've started processing work as the reader threads could exit too early.

Technique 2 - SemaphoreSlim

Another approach (inspired by this StackOverflow answer) to use a SemaphoreSlim with an initialCount equal to the maximum number of threads. Then you use WaitAsync to wait until it's OK to queue up another. So immediately we kick off four tasks, but then have to wait for the first of those to finish before we get past WaitAsync to add the next.

var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: maxThreads);
foreach (var url in urls)
    await throttler.WaitAsync();
        Task.Run(async () =>
                var html = await client.GetStringAsync(url);
                Console.WriteLine($"retrieved {html.Length} characters from {url}");
await Task.WhenAll(allTasks);

The code here is a bit more verbose than the ConcurrentQueue approach and also ends up with a potentially huge list containing mostly completed Tasks, but this approach does have an advantage if you are generating the tasks to be completed at the same time you are executing them.

For example, to upload a large file to Azure blob storage you might read 1MB chunks sequentially, but want to upload up to four in parallel. You don't want to read all the chunks in advance of uploading them as that uses a lot of time and memory before we can even start uploading. With this approach we can generate the work to be done just in time, as threads become available for uploading which is more efficient.

Technique 3 - Parallel.ForEach

The Parallel.ForEach method at first appears to be the perfect solution to this problem. You can simply specify the MaxDegreeOfParallelism and then provide an Action to perform on each item in your IEnumerable:

var options = new ParallelOptions() { MaxDegreeOfParallelism = maxThreads };
Parallel.ForEach(urls, options, url =>
        var html = client.GetStringAsync(url).Result;
        Console.WriteLine($"retrieved {html.Length} characters from {url}");

Looks nice and simple doesn't it? However, there is a nasty gotcha here. Because Parallel.ForEach takes an Action, not a Func<T> it should only be used to call synchronous functions. You might notice we've ended up putting a .Result after GetStringAsync which is a dangerous antipattern.

So unfortunately, this method should only be used if you have a synchronous method you want to perform in parallel. There is a NuGet package that implements an asynchronous version of Parallel.ForEach so you could try that if you'd like to write something like this instead:

await uris.ParallelForEachAsync(
    async url =>
        var html = await httpClient.GetStringAsync(url);
        Console.WriteLine($"retrieved {html.Length} characters from {url}");
    maxDegreeOfParalellism: maxThreads);

Technique 4 - Polly Bulkhead Policy

The final technique is to use the "Bulkhead" isolation policy from Polly. A bulkhead policy restricts the number of concurrent calls that can be made, and optionally allows you to queue up calls that exceed that number.

Here we set up a bulkhead policy with a constrained number of concurrent executions and an unlimited number of queued tasks. Then we simply call ExecuteAsync repeatedly on the bulkhead policy, allowing it to either run it immediately or queue it up if too many.

    var bulkhead = Policy.BulkheadAsync(maxThreads, Int32.MaxValue);
    var tasks = new List<Task>();
    foreach (var url in urls)
        var t = bulkhead.ExecuteAsync(async () =>
            var html = await client.GetStringAsync(url);
            Console.WriteLine($"retrieved {html.Length} characters from {url}");
    await Task.WhenAll(tasks);

As with several other of our solutions we put the tasks into a list and use Task.WhenAll to wait for them. It is worth pointing out though, that this pattern is really designed for the situation where concurrent tasks are being generated from multiple threads (for example from ASP.NET controller actions). They simply use a shared bulkhead policy and then you just run a task with await bulkhead.ExecuteAsync(...). So this approach is very straightforward to use in the situations it is designed for.


Parallelism can greatly speed up the overall performance of your application, but when misused can cause more problems than it solves. These patterns allow you to use a constrained number of threads to work through a batch of jobs. The one you should pick depends on the way you're generating tasks - do you know them all up front, or are they created on the fly while you're already processing earlier tasks? And are you generating these tasks sequentially on a single thread, or are multiple threads able to produce additional work items on the fly?

Of course, I'm sure there are plenty of other clever ways of approaching this problem, so do let me know in the comments what your preffered solution is.

0 Comments Posted in:

Azure Functions allows you to protect access to your HTTP triggered functions by means of authorization keys. For each function you can choose an "authorization level". anonymous means no API key is required, function means a function specific API key is required. So in this case each function has its own keys. And admin means you are required to provide the special "master" host key, which is a single key that can be used to call any function in your function app.

To call a protected function you either provide the key as a query string parameter (in the form code=<API_KEY>) or you can provide it as a HTTP x-functions-key header.

Accessing and managing keys in the portal

The Azure portal makes it nice and simple to discover the values these keys. First of all, if we navigate to any function in the portal, you'll see a "Get Function URL" link:


When we click it, it constructs the URL we need to call including the code query string parameter. This dialog also lets us access values for both types of key - the "function" keys specific to this function, and the "host" keys that can be used on all functions, including the special "_master" host key. You can read more about these key types here.


We can manage the keys for an individual function by heading into the "manage" tab for that function:


In here we get the ability to view, renew or revoke each individual function key as well as the host keys. You can create multiple function or host keys, which is great as it allows you to provide separate keys to every client you want to grant access to your function, or to implement key cycling.


Using the key management API

Now although its very convenient to manage keys in the portal, before long you'll probably want to manage these values programatically, and that's where things get a little bit tricky. There is a key management API which allows you to access the values of keys as well as to generate new ones, delete keys, or update them with new auto-generated actions.

This is ideal if you want to automate the deployment of your function app and programatically discover the keys you need to call the functions, but I quickly ran into a problem. How do you authorize calls to this API? I was familiar with authorizing calls to the Kudu API, which requires you to pass the deployment user and password in a basic auth header. I showed how to do this in a post I wrote a while back on deploying web apps with the kudu zipdeploy API.

But unfortunately, this technique doesn't work for the key management API. I eventually stumbled across a GitHub issue that led me to the answer, so I thought I'd document my solution.

Getting the credentials to access the key management API is a two step process. The first step is calling the Kudu API, calling the api/functions/admin/token endpoint which provides us with a token (a JWT), that we can use as a bearer token to call the key management API.

I'm using the Azure CLI and Powershell, but these techniques could be adapted to whatever language or scripting tool you're using.

The first step is that we need to get the credentials to call the Kudu API. If you're authenticated with the Azure CLI, you can do that by calling the az webapp deployment list-publishing-profiles command and extracting the userName and userPWD for MSDeploy. You need to provide the function app name and resource group name.

The username and password are the ones you can get from the portal by downloading the "publish profile" for your function app. My powershell function also converts the user name and password into a base 64 encoded string in the right format to be used as a basic auth header.

function getKuduCreds($appName, $resourceGroup)
    $user = az webapp deployment list-publishing-profiles -n $appName -g $resourceGroup `
            --query "[?publishMethod=='MSDeploy'].userName" -o tsv

    $pass = az webapp deployment list-publishing-profiles -n $appName -g $resourceGroup `
            --query "[?publishMethod=='MSDeploy'].userPWD" -o tsv

    $pair = "$($user):$($pass)"
    $encodedCreds = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($pair))
    return $encodedCreds

OK, now we have the credentials we need to call Kudu, we can use this information to call the functions/admin/token endpoint. So my next powershell function uses these credentials in a basic authorization header to get a JWT token we can use as a bearer token.

Then it uses that bearer token to call the key management API, on the admin/functions/<FUNCTION_NAME>/keys endpoint to retrieve all the keys for a specific function. I'm just picking out the first key in this example, but you could do something more elaborate if you wanted to access a key by name. Note that unlike the Kudu APIs, which are at https://<YOUR_APP_NAME>.scm.azurewebsites.net, this API is hosted by the Azure Functions runtime itself, so you find it at https://<YOUR_APP_NAME>.azurewebsites.net

function getFunctionKey([string]$appName, [string]$functionName, [string]$encodedCreds)
    $jwt = Invoke-RestMethod -Uri "https://$appName.scm.azurewebsites.net/api/functions/admin/token" -Headers @{Authorization=("Basic {0}" -f $encodedCreds)} -Method GET

    $keys = Invoke-RestMethod -Method GET -Headers @{Authorization=("Bearer {0}" -f $jwt)} `
            -Uri "https://$appName.azurewebsites.net/admin/functions/$functionName/keys" 

    $code = $keys.keys[0].value
    return $code

With all these pieces in place, we are now in a position to put them together to get the authorization key for a specific function in our application:

$appName = "myapp"
$functionName = "myfunc"
$resourceGroup = "myresourcegroup"
$kuduCreds = getKuduCreds $appName $resourceGroup
$code =  getFunctionKey $appName $functionName $kuduCreds
$funcUri = "https://$appName.azurewebsites.net/api/$functionName?code=$code"

# call the function
Invoke-RestMethod -Method POST -Uri $funcUri

Obviously that's just showing how to retrieve the keys for a function, but once you know how to authorize a call to this API, calling the other methods is pretty straightforward.

Hope you found this helpful. It's certainly been very useful for me in automating tests for my function apps.

Want to learn more about how easy it is to get up and running with Azure Functions? Be sure to check out my Pluralsight course Azure Functions Fundamentals.