Archiving Blobs with the Storage SDK
If you have huge amounts of data in Azure Blob Storage, you may want to consider reducing your costs by using one of the cheaper "access tiers".
Hot and Cool Tiers
The default access tier is "Hot", which means that the blob is readily available to access at any time. It's intended for blobs that need to be accessed frequently.
You can also move blobs into the "Cool" tier, which has a reduced cost with the tradeoff of slightly less availability. It's ideal use case is for blobs that you are storing for at least a month and don't need to access frequently.
Archive Tier
There's another access tier, called "Archive" which provides greatly reduced storage costs, but essentially makes your blobs "offline". To read the contents of a blob in Archive, you must first "rehydrate" it back into the Hot or Cool access tiers, which can take several hours. So you should only use the Archive tier for situations where you can accept a delay if you do need to access the file again in the future.
Note that there is also an "early deletion fee" to pay if you remove something from archive storage less than 180 days before putting it in there.
Automatic Archiving
One really nice feature that complements these access tiers is the ability to automatically move blobs between access tiers. So for example you could set a rule that any blob that hasn't been accessed for six months gets automatically moved into archive.
Changing Access Tiers with the SDK
Let's see how we can use the Azure Blob Storage SDK, which is available in the Azure.Storage.Blobs NuGet package to move blobs between access tiers.
We'll start off by creating a container client to work with:
var service = new BlobServiceClient(connectionString);
var containerClient = service.GetBlobContainerClient("mycontainer");
Example 1 - Directly Creating a Blob in the Archive Tier
In this example we'll directly create a new blob (archive.txt
) and put it into the Archive
access tier directly. You might do this if you are backing up large amounts of data for long-term storage that you don't expect to need to access again.
var archiveBlobClient = containerClient.GetBlobClient("archive.txt");
if(!await archiveBlobClient.ExistsAsync())
{
var ms = new MemoryStream(Encoding.UTF8.GetBytes("Archive"));
var uploadOptions = new BlobUploadOptions()
{ AccessTier = AccessTier.Archive };
await archiveBlobClient.UploadAsync(ms, uploadOptions);
}
Example 2 - Moving an Existing Blob to the Archive Tier
In this example we're going to use a blob called moveToArchive.txt
. If it doesn't exist, we'll create it (in the default Hot
access tier), but then immediately ask for it to be archived by calling SetAccessTierAsync
.
If it does exist, we'll simply find out what tier it is in by calling GetPropertiesAsync
and checking the value of AccessTier
. Moving a blob into Archive
should happen immediately.
var moveToArchiveBlobClient = containerClient.GetBlobClient("moveToArchive.txt");
if (!await moveToArchiveBlobClient.ExistsAsync())
{
// upload new file to hot access tier
var ms = new MemoryStream(Encoding.UTF8.GetBytes("Move to Archive"));
await moveToArchiveBlobClient.UploadAsync(ms);
// then move it into the Archive access tier
await moveToArchiveBlobClient.SetAccessTierAsync(AccessTier.Archive);
}
else
{
// already exists - check what tier it is in
var props = await moveToArchiveBlobClient.GetPropertiesAsync();
if (props.Value.AccessTier == "Archive")
{
Console.WriteLine("File has successfully been moved to Archive");
}
}
Example 3 - Rehydrate a blob from Archive
This example is a little more involved. We're going to create a blob called rehydrateme.txt
in the archive tier. If the blob already exists, we'll check what access tier it is in. If it is still in the Archive
tier, and we haven't requested rehydration yet, we'll do so by calling SetAccessTierAsync
.
We can tell if a rehydration is in progress by checking the ArchiveStatus
property. This will have the value of rehydrate-pending-to-hot
or rehydrate-pending-to-cool
. In that case we simply need to wait until the ArchiveStatus
property changes from Archive
.
We can also inspect the value of AccessTierChangedOn
to know when the access tier last changed for the blob.
var rehydrateBlobClient = containerClient.GetBlobClient("rehydrateme.txt");
if (!await rehydrateBlobClient.ExistsAsync())
{
Console.WriteLine("Creating file in archive to rehydrate");
var ms = new MemoryStream(Encoding.UTF8.GetBytes("Rehydrate"));
// upload directly as archive
var uploadOptions = new BlobUploadOptions()
{ AccessTier = AccessTier.Archive };
await rehydrateBlobClient.UploadAsync(ms, uploadOptions);
}
else
{
// already exists - check what tier it is in
var props = await rehydrateBlobClient.GetPropertiesAsync();
if (props.Value.AccessTier == "Archive")
{
if (props.Value.ArchiveStatus == null)
{
Console.WriteLine("Requesting rehydrate");
await rehydrateBlobClient.SetAccessTierAsync(AccessTier.Hot);
}
else
{
Console.WriteLine($"Still rehydrating... {props.Value.ArchiveStatus}, changed {props.Value.AccessTierChangedOn}");
}
}
else
{
Console.WriteLine($"Rehydrated blob is now in {props.Value.AccessTier}");
}
}
When I ran this to test it out, the blob was still rehydrating several hours after I changed access tiers. I'm planning to do some more tests to find out what the average duration of a rehydration is, but it does appear that you should be expecting hours rather than minutes.
According to the documentation, it can take up to 15 hours. If you need it faster, you can pass a RehydratePriortiy
of High
to the SetAccessTierAsync
method which should mean that your blob is available in an hour.
What happens when we try to read a blob in the Archive tier?
Finally, you might be wondering what happens if you have existing code that tries to read the contents of a blob in the Archive
access tier. The answer is that they will get a 409 error back ("This operation is not permitted on an archived blob").
So for example, this code which directly uses the blob client to read a blob in the Archive access tier will throw a RequestFailedException
:
try
{
var b = await archiveBlobClient.DownloadAsync();
using var s = new StreamReader(b.Value.Content);
var contents = await s.ReadToEndAsync();
Console.WriteLine("Contents of archive blob: " + contents);
}
catch (RequestFailedException e)
{
Console.WriteLine("FAILED TO DOWNLOAD ARCHIVE BLOB: " + e.Message);
}
And this code which generates a readable SAS Uri for a blob in archive, and then attempts to use a HttpClient
to download the contents of that blob will get a HttpRequestException
with a 409 error.
// generate a SAS Uri
var accountName = Regex.Match(connectionString, "AccountName=([^;]+);").Groups[1].Value;
var accountKey = Regex.Match(connectionString, "AccountKey=([^;]+);").Groups[1].Value;
var cred = new StorageSharedKeyCredential(accountName, accountKey);
var sasBuilder = new BlobSasBuilder();
sasBuilder.BlobContainerName = archiveBlobClient.BlobContainerName;
sasBuilder.BlobName = archiveBlobClient.Name;
sasBuilder.SetPermissions(BlobSasPermissions.Read);
sasBuilder.ExpiresOn = DateTimeOffset.Now.AddHours(1);
var qparams = sasBuilder.ToSasQueryParameters(cred);
var sasUri = $"{archiveBlobClient.Uri}?{qparams}";
// now try to read from the archive SAS Uri
var h = new HttpClient();
try
{
var contentSas = await h.GetStringAsync(sasUri);
Console.WriteLine("SAS content of archive blob" + contentSas);
}
catch (HttpRequestException hrx)
{
// will get a 409 error
Console.WriteLine("FAILED TO DOWNLOAD ARCHIVE SAS: " + hrx.Message);
}
Summary
The Azure Blob Storage SDK makes it very simple to move files in and out of different Access tiers. This gives you the potential for substantial cost savings if you are storing vast amounts of data that you rarely need to access. However, you may need to rework your application to cope with the delay of moving blobs out of the Archive tier.
Comments
I tried rehydration of few blobs from archive which took around 8-10 hours. But few among the rehydrated blobs are going back to archive tier. What may be the reason?
Monk