Posted in:

UPDATE - this post refers to an older version of the Azure Blob Storage SDK. See here for an updated version of this article.

The Microsoft.Azure.Storage.Blob NuGet package makes it really easy to work with Azure Blobs in .NET. Recently I was troubleshooting some performance issues with copying very large blobs between containers, and discovered that we were not copying blobs in the optimal way.

To copy files with the Azure Blob storage SDK, you first get references to the source and destination blobs like this:

var storageAccount = CloudStorageAccount.Parse(connectionString);
var blobClient =  storageAccount.CreateCloudBlobClient();

// details of our source file
var sourceContainerName = "source";
var sourceFilePath = "folder/test.zip";

// details of where we want to copy to
var destContainerName = "dest";
var destFilePath = "somewhere/test.zip";

var sourceContainer = blobClient.GetContainerReference(sourceContainerName);
var destContainer = blobClient.GetContainerReference(destContainerName);

CloudBlockBlob sourceBlob = sourceContainer.GetBlockBlobReference(sourceFilePath);
CloudBlockBlob destBlob = destContainer.GetBlockBlobReference(destFilePath);

At this point, you might be tempted to copy the blob with code like this:

using (var sourceStream = await sourceBlob.OpenReadAsync())
using (var destStream = await destBlob.OpenWriteAsync())
{
    await sourceStream.CopyToAsync(destStream);
}

Or maybe with the convenient UploadFromStreamAsync method:

using (var sourceStream = await sourceBlob.OpenReadAsync())
{
    await destBlob.UploadFromStreamAsync(sourceStream);
}

However, what you are doing in both those examples is downloading the entire contents of the source blob and re-uploading them to the target blob. This was taking about 20 minutes for the files I was using.

Copying the quick way

Let's see how to copy the blob the quick way:

await destBlob.StartCopyAsync(sourceBlob);

Not only is it trivially simple, but it completes almost instantaneously. And that's because when you're copying a blob within a storage account, the underlying platform doesn't need to make a new copy - it can just update a reference internally.

The name of the method (StartCopyAsync) might make you feel a bit nervous. It implies that it will finish before the copy has been completed. And that can happen if you're copying between storage accounts.

Copying between storage accounts

To copy between storage accounts, you still use the StartCopyAsync method, but pass the Uri of the source blob. The documentation is a bit sparse, but here's how I was able to get it to work.

Notice in this example that we need a separate CloudStorageAccount and CloudBlobClient to the one for the source file. We then create a readable SAS token for the source blob with enough time for a copy to take place. And finally after calling StartCopyAsync, we do need to keep track of the copy progress by checking the CopyState of the blob (and getting the latest value with FetchAttributesAsync)

// create the blob client for the destination storage account
var destStorageAccount = CloudStorageAccount.Parse(destConnectionString);
var destClient = destStorageAccount.CreateCloudBlobClient();

// destination container now uses the destination blob client
destContainer = destClient.GetContainerReference(destContainerName);

// create a 2 hour SAS token for the source file
var sas = sourceBlob.GetSharedAccessSignature(new SharedAccessBlobPolicy() {
    Permissions = SharedAccessBlobPermissions.Read,
    SharedAccessStartTime=DateTimeOffset.Now.AddMinutes(-5),
    SharedAccessExpiryTime=DateTimeOffset.Now.AddHours(2)
});

// copy to the blob using the 
destBlob = destContainer.GetBlockBlobReference(dest);
var sourceUri = new Uri(sourceBlob.Uri + sas);
await destBlob.StartCopyAsync(sourceUri);

// copy may not be finished at this point, check on the status of the copy
while (destBlob.CopyState.Status == CopyStatus.Pending)
{
    await Task.Delay(1000);
    await destBlob.FetchAttributesAsync();
}

if (destBlob.CopyState.Status != CopyStatus.Success)
{
    throw new InvalidOperationException($"Copy failed: {destBlob.CopyState.Status}");
}

If you do need to cancel the copy for any reason, you can get hold of the CopyId from the blob's CopyState and pass that to the AbortCopyAsync method on the CloudBlockBlob.

Uploading local files

Obviously if you're copying blobs into blob storage from local files, then you can't use StartCopyAsync, but there is the convenient UploadFromFileAsync method that will upload from a local file.

await destBlob.UploadFromFileAsync("mylocalfile.zip");

Unfortunately, at the time of writing, the current version of the blob storage SDK (10.0.0.3) is susceptible to out of memory exceptions when uploading huge files. Hopefully that will get resolved soon.

Comments

Comment by Drew Merk

+1 for the reminder to Fetch Attributes everytime you check the CopyState! My code was getting stuck in an infinite loop.

Drew Merk