0 Comments Posted in:

I've posted before about how you can deploy a WebApp as a zip with the Kudu zip deploy API. It's a great way to deploy web apps and is one of the techniques I discuss for deploying miniblog.core.

But as well as allowing us to deploy our web apps, Kudu has an API for managing webjobs. With this API we can deploy and update new webjobs individually, as well as triggering them, configuring settings and even getting their execution history.

Three types of webjob

There are three main types of webjob that you can use. First there are triggered webjobs. These are webjobs that run on demand. Typically they will simply be a console app. You can trigger an execution of one of these webjobs with the Kudu webjobs API (we'll see an example later). A typical use case for this type of webjob might be some kind of support tool that you want to run on demand.

The second type is a scheduled webjob, which is actually just a triggered webjob with a schedule cron expression. The schedule is defined in a settings.json file that sits alongside your webjob executable. This type of webjob is great for periodic cleanup tasks that you need to run on a regular basis without explicitly needing to do anything to trigger them.

Finally there is a continuous webjob. This is an executable that will be run continuously - that is, it will be restarted for you if it exits. This is great for webjobs that are responding to queue messages. The webjob sits listening on one (or many) queues, and performs an action when a message appears on that queue. There's a helpful SDK that makes it easier to build this type of webjob, although I won't be discussing the use of that today.

Where are webjobs stored?

Creating a webjob simply involves dumping our webjob binaries into specially named folders. For a triggered (or scheduled) job, the folder is wwwroot\app_data\jobs\triggered\{job name}, and for a continuous job, it's wwwroot\app_data\jobs\continuous\{job name}. The webjobs host will look inside that folder and attempt to work out what executable it should run (based on a set of naming conventions).

Why the app_data folder? Well that's a special ASP.NET folder that is intended for storing your application data. The web server will not serve up the contents of this folder, so everything in there is safe. It's also considered a special case for deployments - since it might contain application generated data files, its contents won't get deleted or reset when you deploy a new version of your app.

An example scenario

Let's consider a very simple example where we have two web jobs that we want to host. One is a .NET core executable (Webjob1), the other is a regular .NET 4.6.2 framework console app (Webjob2). And we'll also deploy a ASP.NET Core Web API, just to show that you can host web jobs in the same "Azure Web App" instance as a regular web app, although you don't have to.

We'll use a combination of the Azure CLI and PowerShell for all the deployments, but these techniques can be used with anything that can make zip files and web requests.

Step 1 - Creating a web application

As always, with the Azure CLI, make sure you're logged in and have the right subscription selected first.

# log in to Azure CLI
az login
# make sure we are using the correct subscription
az account set -s "MySub"

And now let's create ourselves a resource group with an app service plan (free tier is fine here) and a webapp:

$resourceGroup = "WebJobsDemo"
$location = "North Europe"
$appName = "webjobsdemo"
$planName = "webjobsdemoplan"
$planSku = "F1" # allowed sku values B1, B2, B3, D1, F1, FREE, P1, P1V2, P2, P2V2, P3, P3V2, S1, S2, S3, SHARED.

# create resource group
az group create -n $resourceGroup -l $location

# create the app service plan
az appservice plan create -n $planName -g $resourceGroup -l $location --sku $planSku

# create the webapp
az webapp create -n $appName -g $resourceGroup --plan $planName

Step 2 - Get deployment credentials

We'll need the deployment credentials in order to call the Kudu web APIs. These can be easily retrieved with the Azure CLI making use of the query syntax which I discuss in my Azure CLI: Getting Started Pluralsight course

# get the credentials for deployment
$user = az webapp deployment list-publishing-profiles -n $appName -g $resourceGroup `
    --query "[?publishMethod=='MSDeploy'].userName" -o tsv

$pass = az webapp deployment list-publishing-profiles -n $appName -g $resourceGroup `
    --query "[?publishMethod=='MSDeploy'].userPWD" -o tsv

Step 3 - Build and zip the main web API

As I said, there is no requirement for our Azure "webapp" to actually contain a webapp. It could just host a bunch of webjobs. But to show that the two can co-exist, let's build and zip an ASP.NET Core web api application. I'm just using a very basic example app created with dotnet new webapi. We're using some .NET objects in PowerShell to perform the zip.

$publishFolder = "publish"

# publish the main API
dotnet publish MyWebApi -c Release -o $publishFolder

# make the zip for main API
$mainApiZip = "publish.zip"
if(Test-path $mainApiZip) {Remove-item $mainApiZip}
Add-Type -assembly "system.io.compression.filesystem"
[io.compression.zipfile]::CreateFromDirectory($publishFolder, $mainApiZip)

Step 4 - Deploy with Kudi zip deploy

The Azure CLI offers us a really nice and easy way to use the Kudu zip deploy API. We simly need to use the config-zip deployment source:

az webapp deployment source config-zip -n $appName -g $resourceGroup --src $mainApiZip

However, a regression in the Azure CLI 2.0.25 meant this was broken, so as an alternative you can just call the API directly with the following code, passing the credentials we retrieved earlier.

# set up deployment credentials
$creds = "$($user):$($pass)"
$encodedCreds = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($creds))
$basicAuthValue = "Basic $encodedCreds"

$Headers = @{
    Authorization = $basicAuthValue
}

# use kudu deploy from zip file
Invoke-WebRequest -Uri https://$appName.scm.azurewebsites.net/api/zipdeploy -Headers $Headers `
    -InFile $mainApiZip -ContentType "multipart/form-data" -Method Post

If we want to verify that the deployment worked, we can get the URI of the web app, and call the values controller (which the default webapi template created for us):

# check its working
$apiUri = az webapp show -n $appName -g $resourceGroup --query "defaultHostName" -o tsv
Start-Process https://$apiUri/api/values

Step 5 - Build our first webjob

In our demo scenario we have two webjobs. The first (Webjob1) is a .NET Core command line app. The code is very simple, just echoing a message and including command line arguments

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello from Task 1 (.NET Core) with args [{0}]!", 
            string.Join('|',args));
    }
}

Since .NET Core apps are just DLLs, we need to help the webjobs host to know how to run it by creating a run.cmd batch file that calls the dotnet runtime and passes on any command line arguments. Note: You can get weird errors here if you have a UTF-8 encoded file. Make sure you save this batch file as ASCII.

@echo off
dotnet Webjob1.dll %*

Building and zipping this webjob is no different to what we did with the main web API:

# now lets build the .NET core webjob
dotnet publish Webjob1 -c Release

$task1zip = "task1.zip"
if(Test-path $task1zip) {Remove-item $task1zip}
[io.compression.zipfile]::CreateFromDirectory("Webjob1\bin\Release\netcoreapp2.0\publish\", $task1zip)

Step 6 - Deploy the webjob

Deploying a webjob using the Kudu Webjobs API is very similar to zip deploying the main webapp. We simpply need to provide one extra Content-Disposition header, and we use the PUT verb. We indicate that this is going to be a triggered web job, by including triggeredwebjobs in the path, and we also include the webjob name (in this case "Webjob1")

$ZipHeaders = @{
    Authorization = $basicAuthValue
    "Content-Disposition" = "attachment; filename=run.cmd"
}

# upload the job using the Kudu WebJobs API
Invoke-WebRequest -Uri https://$appName.scm.azurewebsites.net/api/triggeredwebjobs/Webjob1 -Headers $ZipHeaders `
    -InFile $task1zip -ContentType "application/zip" -Method Put

To check it worked, you can visit the Kudu portal and explore the contents of the app_data folder or look at the web jobs page.

# launch Kudu portal
Start-Process https://$appName.scm.azurewebsites.net

We can also check by calling another web jobs API method to get all triggered jobs:

# get triggered jobs
Invoke-RestMethod -Uri https://$appName.scm.azurewebsites.net/api/triggeredwebjobs -Headers $Headers `
    -Method Get

Step 7 - Run the webjob

To run the webjob we can POST to the run endpoint for this triggered webjob. And we can optionally pass arguments in the query string. Don't forget to provide the content type or you'll get a 403 error.

# run the job
$resp = Invoke-WebRequest -Uri "https://$appName.scm.azurewebsites.net/api/triggeredwebjobs/Webjob1/run?arguments=eggs bacon" -Headers $Headers `
    -Method Post -ContentType "multipart/form-data"

Assuming this worked, we'll get a 202 back, and it will include the URI of a job instance we can use to query the output of this job. From the output of that request we'll also get a URI we can call to request the log output, which we can use to see that our webjob successfully ran and got the arguments we passed it:

# output response includes a Location to get history:
if ($resp.RawContent -match "\nLocation\: (.+)\n")
{
    $historyLocation = $matches[1]
    $hist = Invoke-RestMethod -Uri $historyLocation -Headers $Headers -Method Get
    # $hist has status, start_time, end_time, duration, error_url etc
    # get the logs from output_url
    Invoke-RestMethod -Uri $hist.output_url -Headers $Headers -Method Get
}

We can also ask for all runs of this webjob with the /history endpoint:

# get history of all runs for this webjob
Invoke-RestMethod -Uri https://$appName.scm.azurewebsites.net/api/triggeredwebjobs/Webjob1/history -Headers $Headers `
    -Method Get

Step 8 - Deploy and configure a scheduled webjob

For our second webjob, we're using a regular .NET console app running on the regular .NET framework. Here's the code

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello from task 2 (.NET Framework) with args [{0}]", 
            string.Join("|", args));
    }
}

We'll build it with MSBuild and create a zip, very similar to what we did with the first webjob:

# build the regular .net webjob
$msbuild = "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\15.0\Bin\msbuild.exe"
. $msbuild "Webjob2\Webjob2.csproj" /property:Configuration=Release

$task2zip = "task2.zip"
if(Test-path $task2zip) {Remove-item $task2zip}
[io.compression.zipfile]::CreateFromDirectory("Webjob2\bin\Release\", $task2zip)

And then upload it just like we did with the first webjob. Remember a "scheduled" webjob is just a special case of triggered webjob, so we use the triggeredwebjobs endpoint again:

# upload the web job
$ZipHeaders = @{
    Authorization = $basicAuthValue
    "Content-Disposition" = "attachment; filename=Webjob2.exe"
}

Invoke-WebRequest -Uri https://$appName.scm.azurewebsites.net/api/triggeredwebjobs/Webjob2 -Headers $ZipHeaders `
    -InFile $task2zip -ContentType "application/zip" -Method Put

Now if we'd included a settings.json in our zip file, with a cron expression, then this would already be a scheduled job with nothing further to do. But there's a very handy /settings endpoint that lets us push the contents of the settings file, which we can use to set the schedule. Here we'll set up our second webjob to run every five minutes.

$schedule = '{
  "schedule": "0 */5 * * * *"
}'

Invoke-RestMethod -Uri https://$appName.scm.azurewebsites.net/api/triggeredwebjobs/Webjob2/settings -Headers $Headers `
    -Method Put -Body $schedule -ContentType "application/json"

The great thing about this approach is that we can change the schedule without having to push the whole webjob again. And even though this webjob is on a schedule, there's nothing to stop us running it on-demand as well if we want to.

Updating and deleting webjobs

It's very easy to update webjobs (or indeed the main API). You just zip up your new version of the webjob exactly as we did before and upload it through the API. The webjobs are left intact when a new version of the main web app is deployed, so it's safe to update that as well with the zip deploy API.

You can also easily delete webjobs if you no longer need them:

Invoke-WebRequest -Uri https://$appName.scm.azurewebsites.net/api/triggeredwebjobs/WebJob2 -Headers $Headers `
    -Method Delete

Summary

As you can see, the Kudu web jobs API makes it very straightforward, to deploy, run, query and update your webjobs. This makes it a convenient platform for running occasional maintenance tasks. We've seen in this post how this can be easily scripted in PowerShell with the Azure CLI, but you can of course use your preferred shell and language to call the same APIs.

Want to learn more about the Azure CLI? Be sure to check out my Pluralsight course Azure CLI: Getting Started.

0 Comments Posted in:

Regular readers of my blog will know I'm a big fan of Azure Functions, and one very exciting new addition to the platform is Durable Functions. Durable Functions is an extension to Azure Functions allowing you to create workflows (or "orchestrations") consisting of multiple functions.

An example workflow

Imagine we are selling e-books online. When an order comes in we might have a simple three-step workflow. First, let's charge the credit card. Then, we'll create a personalized PDF of the ebook watermarked with the purchaser's email address to discourage them from sharing it online. And finally we'll email them a download link.

example workflow

Why do we need Durable Functions?

This is about as simple a workflow as you can imagine, just three calls one after the other. And you might well be thinking, I can already implement a workflow like that with Azure Functions (or any other FaaS platform). Why do I need Durable Functions?

Well, let's consider how we might implement our example workflow without using Durable Functions.

Workflow within one function

The simplest implementation is to put the entire workflow inside a single Azure function. First call the payment provider, then create the PDF, and then send the email.

void MyWorkflowFunction(OrderDetails order)
{
    ChargeCreditCard(order.PaymentDetails, order.Amount);
    var pdfLocation = GeneratePdf(order.ProductId, order.EmailAddress);
    SendEmail(order.EmailAddress, pdfLocation);
}

Now there are a number of reasons why this is a bad idea, not least of which is that it breaks the "Single Responsibility" principle. More significantly, whenever we have a workflow, we need to consider what happens should any of the steps fail. Do we retry, do we retry with backoff, do we need to undo a previous action, do we abort the workflow or can we carry on regardless?

This one function is likely to grow in complexity as we start to add in error handling and the workflow picks up additional steps.

There's also the issue of scaling - by lumping all steps together they cannot be scaled independently. And how long does this function take to run? The longer it lives for the greater the chance that the VM its running on could cycle mid-operation, leaving us with the need to somehow make the workflow steps "idempotent" so we can carry on from where we left off on a retry.

Workflow via messages

A more common way to implement a workflow like this would be to use intermediate queues between the functions. So the first function charges the credit card, and then puts a message in a queue to trigger the PDF generator function. When that function has finished it puts a message in a queue that triggers the emailing function.

workflow via messages

This approach has a number of benefits. Each function just performs a single step in the workflow, and the use of queues means that the platform will give us a certain number of retries for free. If the PDF generation turns out to be slow, the Azure Functions runtime can spin up multiple instances that help us clear through the backlog. It also solves the problem of the VM going down mid-operation. The queue messages allow us to carry on from where we left off in the workflow.

But there are still some issues with this approach. One is that the definition of our workflow is now distributed across the codebase. If I want to understand what steps happen in what order I must examine the code for each function in turn to see what gets called next. That can get tedious pretty quickly if you have a long and complex workflow. It also still breaks the "single responsibility principle" because each function knows how to perform its step in the workflow and what the next step should be. To introduce a new step into the workflow requires finding the function before the insertion point and changing its ongoing message.

Workflow via a process manager function

Can we go better than this? Well, let's imagine that we introduce a fourth function - a "process manager" or "orchestrator". This function has the job of calling each of the three "activity" functions in turn. It does so by posting queue messages that trigger them, but when they are done, instead of knowing what the next function is, they simply report back to the orchestrator that they are finished by means of another queue message.

process manager

What this means is that the orchestrator function has to keep track of where we are in the workflow. Now it might just about be possible to achieve this without needing a database, if the messages in and out of each "activity" function keep hold of all the state. But typically in practice we'd find ourselves needing to store state in a database and lookup where we got to.

Here's a very simplistic implementation of a process manager function, that doesn't perform any error handling, and assumes all necessary state is held in the messages.

void MyProcessManagerFunction(WorkflowMessage message, Queue queue)
{
    switch(message.Type)
    {
        case "NewOrderReceived":
            queue.Send("CreditCardQueue", new ChargeCreditCardMessage(message.OrderDetails));
            break;
        case "CreditCardCharged":
            queue.Send("GeneratePdf", new GeneratePdfMessage(message.OrderDetails));
            break;
        case "PdfGenerated":
            queue.Send("SendEmail", new SendEmailMessage(message.OrderDetails, message.PdfLocation));
            break;
        case "EmailSent":
            // workflow is complete
            break;
    }
}

This solution has several benefits over the previous two approaches we discussed, but we have now got four queues and four message types just to implement a simple three function sequential workflow. To implement error handling, our response messages would need to be upgraded to include a success flag, and if we were implementing retries, we'd want a way to count how many retries we'd implemented.

So even with this approach, things can get complex.

Durable Functions to the rescue

This is where Durable Functions comes in. Durable Functions makes it much easier to implement this kind of process manager or "orchestrator" pattern. We still have a function per "activity" and we have an "orchestrator" function, but now we don't need to manage any of the messaging between the functions ourselves. Durable Functions implements also manages workflow state for us, so we can keep track of where we are.

Even better, Durable Functions makes really creative use of the C# await keyword to allow us to write our Orchestrator function looking like it was a regular function that calls each of the three activities in turn even though what is really happening under the hood is a lot more involved. That's thanks to the fact that the Durable Functions extension is built using Microsoft's existing Durable Task Framework

Here's a slightly simplified example of what a Durable Functions orchestrator function might look like in our situation. An orchestration can receive arbitrary "input data", and call activities with CallActivityAsync. Each activity function can receive input data and return it.

async Task MyDurableOrchestrator(DurableOrchestrationContext ctx)
{
    var order = ctx.GetInput<OrderDetails>();
    await ctx.CallActivityAsync("ChargeCreditCard", order);
    var pdfLocation = await ctx.CallActivityAsync<string>("GeneratePdf", order);
    await ctx.CallActivityAsync("SendEmail", new { order.EmailAddress, pdfLocation });
}

As you can see the code now is extremely easy to read - the orchestrator is defining the workflow but not concerned with the mechanics of how the activities are actually called.

Error handling

What about error handling? This is where things get even more impressive. With Durable Functions we can easily put retry policies with custom back-offs around an individual activity function simply by using CallActivityWithRetryAsync, and we can catch exceptions wherever they occur in our workflow (whether in the orchestrator itself or in one of the activity functions) just with a regular C# try...catch block.

Here's an updated version of our function that will make up to four attempts to charge the credit card with a 30 second backoff between each one. And it also has an exception handler which could be used for logging, but equally you can call other activity functions - maybe we want to send some kind of alert to the system administrator and attempt to inform the customer that there was a problem processing their order.

async Task MyDurableOrchestrator(DurableOrchestrationContext ctx)
{
    try
    {
        var order = ctx.GetInput<OrderDetails>();
        await ctx.CallActivityWithRetryAsync("ChargeCreditCard", 
            new RetryOptions(TimeSpan.FromSeconds(30), 4), order);
        var pdfLocation = await ctx.CallActivityAsync<string>("GeneratePdf", order);
        await ctx.CallActivityAsync("SendEmail", new { order.EmailAddress, pdfLocation });
    }
    catch (Exception e)
    {
        // log exception, and call another activity function to if needed
    }
}

Benefits of Durable Functions

As you can see, Durable Functions addresses all the issues with implementing this workflow manually.

  • It allows us to have an orchestrator that very straightforwardly shows us what the complete workflow looks like.
  • It lets us put each individual "activity" in the workflow into its own function (giving us improved scalability and resilience).
  • It hides the complexity of handling queue messages and passing state in and out of functions, storing state for us transparently in a "task hub" which is implemented using Azure Table Storage and Queues.
  • It makes it really easy to add retries with backoffs and exception handling to our workflows.
  • It opens the door to more advanced workflow patterns, such as performing activities in parallel or waiting for human interaction (although those are topics for a future post).

So although Durable Functions are still in preview and have a few issues that need ironing out before they go live, I am very excited about their potential. Whilst it is possible to implement similar workflow management code yourself, and there are various other frameworks offering similar capabilities, this really is remarkably simple to implement and an obvious choice if you're already using Azure Functions.

Want to learn more about how easy it is to get up and running with Azure Functions? Be sure to check out my Pluralsight course Azure Functions Fundamentals.

0 Comments Posted in:

I had the privilege of speaking on Technical Debt at DevOps Oxford last week, and although most of my focus was on talking about what "technical debt" is and how we can repay it, I wanted to include a few thoughts on whether its possible to write code in such a way that we don't accumulate large amounts of technical debt in the first place. Prevention is better than cure, right?

Practices for avoiding technical debt build-up

There are a lot of very obvious practices that will help reduce the rate at which we accumulate technical debt:

  • Ensuring we write "clean code" and follow "SOLID" coding practices
  • Being disciplined about creating automated tests for all of our code
  • Repaying known technical debt promptly, and not putting it on the backburner
  • Not letting doing things the "quick way" instead of the "right way" become the norm. Deciding to take on technical debt should be the exception not the rule
  • Using the insights gained from metrics and retrospectives to strategically drive initiatives to improve the codebase.

Technical debt is proportional to lines of code

All of these ideas ought to decrease the rate at which technical debt accumulates in our projects. But irrespective of this, I've come to think that technical debt is proportional to the number of lines of code. The more code you have, the more technical debt you have.

That doesn't mean that two projects each containing a million lines of code will have the same amount of technical debt as the other. One may be on the verge of collapse, while the other may still be very maintainable. But you can be sure that in both codebases, there's more technical debt now than there was when there were only 500,000 lines of code, and more will be introduced with the next 500,000 lines to be written.

Modularization

If technical debt is proportial to lines of code, then one of the most effective strategies must surely be modularization. Instead of having everything in one monolithic source code repository, look for ways to extract chunks of functionality into their own smaller projects.

This might be extracting a component (e.g. an npm or NuGet package), or moving some functionality out into a separately deployed microservice.

These modules should have their own source control, CI build, and deployment procedures. And they can be worked on independently of the rest of the system.

Well defined interfaces

Now you might be thinking - modularization can't magically make technical debt disappear. And of course it won't. But it does push it out of our way.

So long as the components or services we extract have a well-defined interface, we don't need to concern ourselves with the internal implementation details of those components, or worry about technical debt that may be hiding in there.

One question that came up at the user group is whether the extracted modules should be developed in a consistent way. Should all our components/microservices share the same tech stack, coding conventions, build and unit testing strategy etc? Obviously this sort of consistency is very beneficial, as it allows developers to get up to speed very quickly. But conversely, one advantage of breaking components up like this is that we have freedom to adopt newer and better tools, technologies and practices. So be consistent where it makes sense, but don't let it get in the way of progress.

The advantages of smaller codebases

The fact is that smaller codebases have multiple advantages compared to bigger ones.

  • They fit into our tiny brains more easily. It's possible for a new developer to explore the project and fully understand it in a short period of time. This helps address the problem I call "knowledge debt" in my Pluralsight course, where the turnover of developers means there are large swathes of code that noone knows how they work.
  • They are easier to refactor. In large codebases, you find that tight coupling gets introduced, and so changes as insignificant as a method signature change can have ripple on effects that take ages to resolve. With smaller codebases, "find all references" returns a managable number of results, making it much easier to reason about the impact of changing something.
  • They are easier to test. When we take the trouble to extract code out into its own service or component, we introduce a "seam" that makes testing possible. By decoupling it from its consumers we make it possible to test in isolation, and by introducing a clean interface, we make it possible for its collaborators to mock.
  • They are possible to rewrite or replace. Another issue I discuss in my Pluralsight course is "architectural debt" where the architecture of our system is proving an obstacle to future development, either because we just got it wrong, or because the project has moved in new and unanticipated directions. Resolving architectural debt is painful because it often requires major code upheaval. If a component or service is small enough to rewrite, then it becomes possible to introduce the newer version in an incremental fashion throughout the rest of the codebase.
  • They allow migration to newer tools and technologies. This addresses what I call "technology debt" - where we get stuck on a legacy version of some technology or open source dependency, and aren't able to upgrade because the knock-on effects are too great. Modularization allows individual services or components to adopt new technologies at their own pace, without needing to change everything at once.

No silver bullet

Is modularization the "silver bullet" that means we can create huge systems that have no technical debt problems? Of course not! I'm sure you could make a horrendous mess out of lots of small components. And certainly microservices have potential to introduce a whole new class of difficulties that a non-distributed system doesn't suffer from.

But personally, every time I've found myself on a project that has a "technical debt" problem, the sheer size of the codebase has been a major obstacle to addressing the key issues. But where we've been able to break things up into more managable sizes, it becomes much easier to address the technical debt problems that are causing us the most acute pain. Not all technical debt has the same "interest rate" so we need to be strategic about what order we tackle things in.

Anyway, those are my thoughts. I know the topic of "technical debt" always generates lots of lively debate, so I'd love to hear your ideas in the comments.

Want to learn more about the problem of technical debt and how you can reduce it? Be sure to check out my Pluralsight course Understanding and Eliminating Technical Debt.