0 Comments Posted in:

I recently needed to write some code in C# that could list the contents of a GitHub repository, and download the contents of specific files.

There is actually a GitHub API, which means that this simple URL (https://api.github.com/repos/markheath/azure-deploy-manage-containers/contents) can be used to list all the files in one of my GitHub repos.

Calling this from C# is relatively straightforward, with the exception that you must provide a user agent header or you'll get a 403 response.

Here's some simple sample code that gets the contents of a GitRepo and displays the URL to download each file (and using Newtonsoft.Json to help with the JSON parsing). Notice that for directories, you have to call another URL to get the files in that directory.

var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.Add(
    new ProductInfoHeaderValue("MyApplication", "1"));
var repo = "markheath/azure-deploy-manage-containers";
var contentsUrl = $"https://api.github.com/repos/{repo}/contents";
var contentsJson = await httpClient.GetStringAsync(contentsUrl);
var contents = (JArray)JsonConvert.DeserializeObject(contentsJson);
foreach(var file in contents)
{
    var fileType = (string)file["type"];
    if (fileType == "dir")
    {
        var directoryContentsUrl = (string)file["url"];
        // use this URL to list the contents of the folder
        Console.WriteLine($"DIR: {directoryContentsUrl}");
    }
    else if (fileType == "file")
    {
        var downloadUrl = (string)file["download_url"];
        // use this URL to download the contents of the file
        Console.WriteLine($"DOWNLOAD: {downloadUrl}");
    }
}

If you need to fetch the contents of a specific branch, you can simply append ?ref=branchname as a query string parameter.


0 Comments Posted in:

If you're building a new application in Azure and want to use a "serverless" approach, what should you use as a database? Obviously, one of the key goals of "serverless" is to avoid having to manage your own servers, so the classic "IaaS" approach of installing a database on a Virtual Machine isn't a good fit. But there are still plenty of great options. I talked about this in my "Building Serverless Applications in Azure" course on Pluralsight, but things have moved on a bit since then so I thought it was worth revisiting the topic.

As I see it, in Azure there are three main database options to choose between:

  • Relational databases - Azure SQL Database being the most obvious choice here
  • Document database - Azure Cosmos DB is Azure's offering in this space
  • The budget option (or "poor man's" database) - You can also use Azure Storage as a primitive database for minimal cost

Relational Databases

For many (if not most) software developers, relational databases are the most familiar, and they are often our go-to option for storing data. They have the advantage of allowing very flexible queries and joins between related entities (hence the name), but do require the schema to be designed up front, and modifying that schema requires some kind of migration to be performed.

Azure offers a choice of relational databases. The main one is Azure SQL Database, which is essentially a fully managed SQL Server in a PaaS offering. But there is also Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostgreSQL available if you are more comfortable with working with one of those databases.

Azure SQL Database is a great choice for a serverless application if you do decide that a relational database is the right choice for you. It's really easy to create one and there are several pricing tiers to support everything from a very small and cheap test system, all the way up to a powerful large-scale production system.

Azure SQL Database makes it really easy to enable key features for production scenarios such as encryption at rest with customer managed keys, backing up (with point-in-time restore), and replication to another region. It even comes with a superb query performance insights blade in the Portal that can tell you which of your queries are performing poorly and what indexes could improve them.

One disadvantage of going for a relational database in a serverless Azure application is that it is a little bit trickier to use from Azure Functions. There aren't built-in bindings like there are for Cosmos DB or Azure Storage, so you need to write your own Entity Framework code to access the database.

Another interesting recent development is that there is now a "serverless" pricing tier for Azure SQL Database. This essentially means that if your database is idle for a certain period (at least an hour) it can hibernate to save you money. It can also automatically scale itself up (within predefined limits) to respond to additional load. This might sound perfect for any serverless application but it does come with some caveats.

First, if your database has gone to sleep, there will be a fairly significant "cold start" penalty to wake it up (resuming takes up to a minute). And secondly, if your database never goes to sleep, then this option can work out more expensive. So beware of having scheduled jobs that run every hour with this approach, as your database will never go to sleep.

Document Databases

Document databases are in many ways a perfect fit for serverless architectures. Because you don't need to predefine your schema up front, they allow you to rapidly iterate and evolve your application over time with minimal fuss. Azure Functions come with some built-in bindings to simplify the code needed to read and store data in a document database.

Although Azure only offers a single document database offering - Cosmos DB, it is an extremely flexible and powerful database. It even supports a variety of different APIs including allowing you to use (for example) the MongoDB API if you're more familiar with that.

One of the most interesting features of Cosmos DB for serverless applications is its concept of a "change feed". This allows you to easily create an Azure Function that can "subscribe" to all changes to documents in a collection. This makes it really easy to generate "materialized views" that allow you to optimize performance and reduce costs of queries.

When Cosmos DB originally came out, the pricing model scared a lot of developers off - the cheapest possible database was three times the cost of the cheapest Azure SQL Database. But things have improved greatly.

Firstly, there is a free tier - allowing you to use a certain amount of resources for free each month which is great for testing and experimenting.

Secondly, Microsoft recently announced a serverless pricing model where the billing will only be based on storage and operations provisioned and could be a good choice for spiky workloads.

Thirdly, you can scale Cosmos DB up and down on the fly, and there is even an "auto-scale" feature that will intelligently scale up and down to save money during idle periods, while meeting demand during peak times.

Using Azure Storage as a poor man's database

Some serverless applications have very simple storage requirements. Maybe you don't often update data, or maybe you don't need rich querying capabilities, and can just look things up by their id.

Azure Storage offers very cheap ways of storing data. For example you could just store data in blobs as JSON or XML files. Or you could use Table Storage, which allows you to store simple table-based documents with a composite key of a "row key" and a "partition key". I've used both options for several small websites and microservices which simply didn't need the cost or complexity of a full database.

This approach can be a great starting point for a proof-of-concept app, and you can graduate later to a "proper" database as your needs change.

The Hybrid Approach

Of course, there's no reason why you have to pick just one of the above options. Especially if you are using a microservices architecture, each microservice can take it's own approach, using the one most appropriate database for the type of data you are storing.

In fact, you may find tht the best approach is hybrid, adding in services like Azure Cognitive Search Azure Redis Cache, Blob Indexer. So don't feel that you have to pick just one database type for storing all the data in your serverless application.


0 Comments Posted in:

In this series:

In this edition of my series on Azure Service Bus, I want to highlight a few of the "premium" features that Service Bus offers. Many of these aren't necessary if you are just learning and experimenting with Service Bus, but if you are using it for mission critical production systems, then its well worth taking advantage of some of these capabilities.

Pricing Tiers

Like many Azure services, Service Bus offers a choice of pricing tiers.

The cheapest, Basic, has a very limited feature set - you can only use Queues, not Topics and Subscriptions. But there is no monthly charge, and you only pay for your messaging operations, so it's a kind of "serverless" pricing option.

However, the Standard pricing tier has a lot more features and has a base charge of about $10 a month. It's ideal for using in developer, testing and QA environments, and is also perfectly fine to use in production assuming you don't have very high performance needs or any of the other premium features we'll discuss shortly.

But there is also a Premium tier, which unlocks several additional features in the areas of performance, security, and resilience. Let's look in a bit more detail at these.

Performance

With the standard tier, your messages are handled by shared infrastructure in Azure. This results in limitations on the throughput of messages and its possible that you will get throttled. With Premium messaging, the pricing model is based on "messaging units" which are dedicated resources in Azure, giving you predictable throughput and latency. It's possible to scale up and down the number of messaging units depending on load.

Premium messaging also increases the maximum message size from 256KB up to 1MB per message. This can be very useful, but I would also suggest that if your messages are getting that large, it may be worth considering whether there is any inefficiency in your architecture that could be addressed. If huge messages really are necessary, it might appropriate to use the claim check pattern instead.

Security

Whichever pricing tier you use to access Service Bus, your messages are sent and stored securely. You can authenticate with a connection string, or use Azure Active Directory service principals which is particularly valuable if you are making use of Managed Identities as it eliminates the need for your applications to store any connection details at all.

In many industries it is really important that all data is "encrypted at rest", and ideally using "customer managed encryption keys". It is possible to achieve this by encrypting your own message payloads before sending them and decrypting them on receipt but that is tricky to implement well yourself. With Service Bus Premium it's possible to configure encryption at rest with a customer managed key.

You may also want to restrict access to your Service Bus at the network level, and the Premium tier offers additional capabilities for doing so such as VNet integration and private endpoints. This is important because it's not uncommon for messages to contain sensitive customer data, and so exactly the same stringent protection mechanisms should apply to service busses as you would for databases in a production system.

Resilience

The final capability I want to highlight is to do with resilience. In mission-critical production systems it's vital that there is no single point of failure, and it's all too easy to write code that simply assumes that the Service Bus is going to have 100 percent availability.

But of course in the real world things can and do go wrong, so it's important to have mitigations in place for if there is a problem with your Service Bus. Premium Service Bus allows you make use of availability zones meaning that even if there is a failure in one physical location, you will still have access to your Service Bus.

On top of that, it's possible to configure "geo disaster recovery" with Service Bus premium tier. This protects you against some of the more catastrophic failure scenarios by allowing you to pair a secondary Service Bus namespace in another region. All the messages get mirrored there, so if your primary is lost you can failover to the secondary.

Summary

While the standard tier of Service Bus is just fine for development and testing and simple production applications, for large business critical systems you should strongly consider taking advantage of the improved performance, security and resilience of Premium tier Service Bus.

Want to learn more about how easy it is to get up and running with Durable Functions? Be sure to check out my Pluralsight course Azure Durable Functions Fundamentals.