ARM Templates vs Azure CLI

Recently, I've been posting tutorials about how to deploy Azure Function Apps with the Azure CLI and create a managed identity to enable your Function App to access Key Vault. I love how easy the Azure CLI makes it to quickly deploy and configure infrastructure in Azure.

But is the Azure CLI is the right tool for the job? After all, aren't we supposed to be using ARM templates? If you've not used them before, ARM templates are simply JSON files describing your infrastructure which can be deployed with a single command.

My general recommendation is that while the Azure CLI is great for experimenting and prototyping, once you're ready to push to production, it would be a good idea to create ARM templates and use them instead.

However, in November, an interesting tweet caught my eye. Pascal Naber wrote a blog post making the case that ARM is unnecessarily complex compared to just using the Azure CLI. And I have to admit, I have some sympathy with this point of view. In the article he shows a 200+ line ARM template and contrasts it with about 10 lines of Azure CLI to achieve the same result.

I’ve written a new blogpost: “Stop using ARM templates! Use the Azure CLI instead”. Read it here: https://t.co/MhOeKpvnDR #azure #cli #arm #devops @Xpiritbv
— Pascal Naber (@pascalnaber) November 12, 2018

So in this article I want to give my thoughts on the merits of the two different approaches: ARM templates which are a very declarative way of expressing your infrastructure (i.e. what should be deployed), versus Azure CLI scripts which represent a more imperative approach (i.e. how it should be deployed).

Infrastructure as Code

The term "infrastructure as code" is used to express the idea that your infrastructure deployment should be automated and repeatable, amd the "code" that defines your infrastructure should be stored in version control. This makes a lot of sense: you don't want error-prone manual processes to be involved in the deployment of your application, and you want to be sure that if all your infrastructure was torn down, you could easily recreate exactly the same environment.

But "infrastructure as code" doesn't dictate what file format or DSL our infrastructure should be defined in. The most common approaches are JSON (used by ARM templates) and YAML (used by Kubernetes). Interestingly, Service Fabric Mesh has introduced a YAML format that gets converted behind the scenes into an ARM template, presumably because the YAML allows a simpler way of expressing the makeup of the application (we'll come back to this idea later).

However, there's no obvious reason why a PowerShell or Bash script couldn't also count as "infrastructure as code", or even an application written in JavaScript or C#. And thanks to the Azure CLI, Azure PowerShell, Azure SDK for .NET and Azure Node SDK, you can easily use any of those options to automate deployments.

The key difference is not whether both approaches count as "infrastructure as code", but the idea that declarative ways of defining the infrastructure are better than imperative. A JSON document contains no logic - it simply expresses all the "resources" that form the infrastructure and their configuration. Whereas if we choose to write a script using the Azure CLI, then it is inherently imperative - it describes the steps required to provision the infrastructure.

So which is best?

Declarative

Well the received wisdom is definitely that declarative is best. Azure strongly encourages the you to use JSON based ARM templates, Service Fabric Mesh and Docker use YAML, and other popular Infrastructure as code services like Terraform have their own file-format designed to be more a more-readable alternative to JSON.

In most cases, you are simply defining the "resources" that form your infrastructure - e.g. I want a SQL Server, a Storage Account, an App Service Plan and a Function App. You also get to specify all the properties: what location should the resources be, what pricing tier/sizing do I want, what special configuration settings do I need to enable? Most of these formats also allow you to include the application code itself as a configuration property: you can specify what version of a Docker image your Web App should run, or what GitHub repository the source code for your Function App can be found in, allowing a fully-working application to be deployed with a single command.

There are several key benefits to the declarative approach. First of all, it uses a desired state approach, which allows for incremental and idempotent deployments. In other words, your template defines what resources you want to be present, and so the act of deploying that template will only take effect if those resources are not already present, or are not in the state you requested. This means that deploying an ARM template is idempotent - there is no danger in deploying it twice - you won't end up with double of everything, or errors on the second run-through.

There are some other nice benefits to declarative template files. They can be validated in advance of running, greatly reducing the chance that you could end up with a half-complete deployment. The underlying deployment engine can intelligently optimize by identifying which resources are needed first and what steps can be performed in parallel. Any logic to retry actions in the case of transient failures is also built into the template deployment engine. And templates can be parameterized, allowing you to use the same template to deploy to staging as well as production. Parameters also enable you to avoid storing secrets in templates.

But it's not all great. Declarative template formats like ARM tend to suffer from a number of weaknesses. The templates themselves are often very verbose, especially if you get a tool to auto-generate them, and if you prefer to hand-roll them, the documentation is often sparse, and its a cumbersome and error prone process. When I build ARM templates I usually start by copying one of the Azure Quickstart templates and adapting it to my needs. But often that requires me to also visit resources.azure.com to attempt to deduce what template setting is needed to enable a feature I only know how to turn on via the portal. It can be a painfully slow and error-prone process.

Another issue is that although YAML and JSON files are touted as being "human readable", the fact is that they quickly lose their readability once they go beyond a screen-full of text, as Pascal's example clearly demonstrated.

And there are some practical annoyances. For example, a while ago I deployed a resource group that used some secrets. I parameterized them in the template (as is the best practice), and so when I initially deployed the ARM template, I provided those secret values. But the trouble was, now every time I wanted to redeploy the template because of some other unrelated change, I needed to source those secret values again even though they weren't modified. There didn't seem to be an obvious way of asking it to simply leave those secrets with the values they had on a previous deployment.

And this brings me onto the final issue that you inevitably run into with these templates. They end up requiring their own pseudo-programming language. In ARM templates, there are often dependencies between items. I need the Storage Account to be created before the Function App, because the Function App has an App Setting pointing at the connection string for the Storage Account. In the case of a web app that talks to a database it might be even more complex, with the database needing the web app's IP address in order to set up firewall rules, while the web app needing the database's connection string, resulting in a circular dependency.

The ARM template syntax has the concept of 'variables' which can be calculated from parameters, and can be manipulated using various helper functions such as 'concat' and 'listkeys' as you can see in the following example:

{
    "name": "AzureWebJobsStorage",
    "value": "[concat('DefaultEndpointsProtocol=https;AccountName=', variables('storageAccountName'), ';AccountKey=', listKeys(variables('storageAccountId'),'2015-05-01-preview').key1)]"
},

And this seems to be an inevitable pattern in any declarative template format that attempts to define something moderately complex - you end up wanting regular programming constructs, such as conditional expressions, string manipulations, and loops. Here's a snippet from an API Management policy defined in XML I saw recently that you can see has also introduced a level of scripting.

<set-header name="X-User-Groups" exists-action="override">
    <value>
        @(string.Join(";", (from item in context.User.Groups select item.Name)))
    </value>
</set-header>

The frustration I have with these DSLs within templates is that they are very limiting, lack support for intellisense and syntax highlighting, and tend to make our templates more indecipherable and fragile. Escaping values correctly can become a real headache as you can find yourself encoding JSON strings within JSON strings.

Imperative

So why not just write our deployment scripts in a regular scripting or programming language? There are some obvious benefits. The language already has familiar syntax, supporting conditional steps, storing and manipulating variables for later use, generating unique names according to a custom naming conventions, and much more. Our editors can help us with intellisense, syntax highlighting and refactoring shortcuts.

Also, we can follow the principles of "clean code" and extract blocks of logic into reusable methods. So I might make a methods that knows how to create an Azure Function App configured just the way I like it, with specific features enabled, and specific resource tags that I always apply. This allows the top-level deployment script/code to read very naturally whilst hiding the less intersting or repetitive details at a lower level.

For example, the fluent Azure C# SDK syntax gives an idea of what this could look like. Here's creating a web app:

var app1 = azure.WebApps
    .Define("MyUniqueWebAddress")
    .WithRegion(Region.USWest)
    .WithNewResourceGroup("MyResourceGroup")
    .WithNewWindowsPlan(PricingTier.StandardS1)
    .Create();

And you could easily build upon this approach by defining your own custom extension methods.

Just like ARM templates, imperative deployment scripts can easily be parameterized, ensuring you keep secrets out of source control, and can reuse the same script for deploying to different environments.

But imperative deployment scripts like this do potentially have some serious drawbacks. The first is: what about idempotency? If I run my script twice, will it fail the second time because things are already there? Can it work out what's missing and only create that? Well, we don't want to bloat our script to have to put lots of conditional logic in, checking if a resource exists and only creating it if it is missing, but it turns out that it's not all that hard to achieve. In fact, Pascal Naber recently posted a gist showing an idempotent bash script using the Azure CLI to deploy a Function App configured to access Key Vault. You can safely run it multiple times.

For example if I run the following Azure CLI commands multiple times, I won't get any errors:

az group create -n "IdempotentTest" -l "west europe"
az appservice plan create -n "IdempotentTest" -g "IdempotentTest" --sku B1

But what about the desired state capabilities of a declarative framework like ARM templates? What if we wanted a Standard rather than Basic tier app service plan? Let's try:

az appservice plan create -n "IdempotentTest" -g "IdempotentTest" --sku S1

And this works - our app service plan gets upgraded to the standard tier! Let's make it harder. What if we decide it should be a Linux app service plan:

az appservice plan create -n "IdempotentTest" -g "IdempotentTest" `
                --sku S1 --is-linux

And now we get an error - "You cannot change the OS hosting your app at this time. Please recreate your app with the desired OS." Although, to be fair, I'm not sure an ARM template deployment would fare any better attempting to make this change. Not all modifications to desired state can be straightforwardly implemented.

To be honest, I was a little surprised by this. I hadn't realised the Azure CLI had this capability, and it makes it a much more competitive alternative to ARM templates. I haven't tried the same thing with the Azure for .NET SDK - that would be in interesting experiment for the future.

This leaves me thinking that ARM templates actually offer very few tangible benefits over using a scripting approach with Azure CLI. Perhaps one weakness of the scripting approach is that idempotency certainly is not automatic. You'd have to think very carefully about what the conditional steps and other logic in your scripts were doing. For example, if you generate a random suffix for a resource name like I do in many of my PowerShell scripts, then straight off you've not got idempotency - you'd need custom code to check if the resource already exists and find out what random suffix you used last time.

But it's interesting that we are starting to see this approach to infrastructure as code gaining momentum elsewhere. I've not had a chance to play with Pulumi yet, but it seems to be taking a very similar philosophy - define your infrastructure in JavaScript, taking advantage of the expressiveness, familiarity, reusability and abstractions that a regular programming language can offer.

The Verdict

There are good reasons why ARM templates are still the recommended way to deploy resources to Azure. They help you avoid a lot of pitfalls, and still have a few benefits that are hard to replicate with a scripting or regular programming language. But they come at a cost of complexity and are generally unfriendly for developers to understand and tweak. It feels to me like we're not too far away from code-based approaches being able to offer the same benefits but with a much simpler and more developer-friendly syntax. The Azure CLI already seems very close so long as you take a sensible approach to what additional actions your script performs.

Maybe what's needed is simply a much easier way to generate the templates in the first place - if I can write a very simple script that produces an ARM template, then I don't need to worry about how verbose the resulting template is. It seems to me that's what the Service Fabric Mesh team decided by choosing to create a YAML resource definition that gets compiled into ARM. (Although I'm sure that before long that YAML will start adding DSL like constructs for things like string manipulation).

Anyway, thanks for sticking with this rather long and rambling post. I'm sure there's a lot more that could said on the strengths and weaknesses of both approaches, so I welcome your feedback in the comments!

Comments

January 18. 2019 22:35

One attribute of using ARM templates is that you specify an API version of a resource provider. That way you know that resources are created exactly the same, regardless if newer versions of resource providers are released. Newer version may result in slightly different end result in how things are created.
How can we achieve similar behavior using the Azure CLI? Do we need to keep track of the version of the CLI that we use? That seems difficult. Any thoughts on this?

Richard Waal

January 19. 2019 13:36

There certainly still some benefits of ARM over Azure CLI and that is likely one of them. I'd hope that in most cases the version of the resource provider doesn't matter too much - you've specified what you really care about with the CLI parameters and you'll get a resource with those characteristics. I'd be interested to know a concrete example of where the version of the resource provider makes a significant difference.

Mark Heath

January 24. 2019 21:48

As I've started digging more and more into the Azure CLI and I've come to a similar verdict. Great to play around with in a development environment but, when it comes to managing your infrastructure ARM templates are more desirable. The primary reason is, like you've said, due to idempotency. Great article, thanks for sharing.

Jason Robert

February 17. 2019 10:46

Although the flip side to that is now you have yet another list of dependency versions to maintain. Seems nicer to build in enough testing and/or staged rollout to your release pipeline that you can allow for auto upgrades to the infrastructure, although guess as always it depends.

Matt

February 22. 2021 20:13

It has been a couple of years since you wrote this. Has your approach changed since then? We decided to take the Bash az cli route. Regarding idempotency, we found that when the resource already exists, it will simply perform an update if there any changes in any properties. We really like being to do reuse of existing functions and code and run loops to create batches of resources. Bash is pretty robust in this regard in conjunction with az cli.
Has anyone tried doing a combination of both where you use scripts (az cli or powershell) and ARM Json templates together?

Paul VanRoosendaal