Efficiently transfer c# objects to from Azure blob storage

Photo by Jeff DeWitt on Unsplash

Efficiently transfer c# objects to from Azure blob storage

Azure Blob Storage is primarily designed to store files and the client API is intuitively geared towards working with files and streams

Recently I had to deal with reading/writing in-memory objects using blob storage. There was not much literature regarding doing this efficiently using the API. So I began the journey to find one, or rather stumbled into one

Here was the first pass at reading from blob storage:

using Newtonsoft.Json;

public async Task<T> ReadDataAsync (string blobId, CancellationToken c)
{
    var client = containerClient.GetBlobClient(blobId);
    using var stream = await client.OpenReadAsync(null, c);
    using var streamReader = new StreamReader(stream);
    using var json = new JsonTextReader(streamReader);
    return JsonSerializer.CreateDefault().Deserialize<T>(json);
}

This is okay performance wise, StreamReader and JsonTextReader should optimize memory usage but the code is too verbose. We can do better.

Here we go with the next pass on reading blob storage

using System.Text.Json;

public async Task<T> ReadDataAsync (string blobId, CancellationToken c)
{
    var client = containerClient.GetBlobClient(blobId);
    using var stream = await client.OpenReadAsync(null, c);
    return await JsonSerializer.DeserializeAsync<T>(stream, null, c);
}

This is much better, now we are using the new module System.Text.Json which helps tidy up the code a little better, reducing 3 lines from the first pass to just 1 line. Performance wise it is the same as before

Now lets see the first pass at writing to blob storage

public async Task WriteAsync (SomeData data, string blobId)
{
    var client = containerClient.GetBlobClient(blobId);
    await using var ms = new MemoryStream();
    var json = JsonConvert.SerializeObject(data);
    var writer = new StreamWriter(ms);
    await writer.WriteAsync(json);
    await writer.FlushAsync();
    ms.Position = 0;
    await client.UploadAsync(ms);
}

This is sub par to say the least as put in a PR comment. That's exactly what happened to me. I wrote this code and the PR comment recommended me to optimize it! Here is why this aprroach is not so good, first the data is serialized to json, next the serialized json is written into stream. And finally after resetting the stream position to 0 it is uploaded to blob storage. This is 3X the memory of the original data. We can definitely do better

Now the second pass at writing to blob storage

using System.Text.Json;

public async Task WriteAsync<T> (T data, string blobId)
{
    var client = containerClient.GetBlobClient(blobId);
    await using var ms = new MemoryStream();
    await JsonSerializer.SerializeAsync(ms, data);
    ms.Position = 0;
    await client.UploadAsync(ms);
}

I would say some improvement compared to the first pass, instead of 3x memory now it is down to 2x memory, but we can do better. With that PR review encouragement here is the final pass on writing to blob storage. Big thanks to this stackoverflow answer that mad use of Azure.Storage.Blobs.Specialized stackoverflow.com/questions/62279770/how-to..

using System.Text.Json;
using Azure.Storage.Blobs.Specialized;

public async Task WriteAsync<T> (T data, string blobId)
{
    var client = containerClient.GetBlockBlobClient(blobId);
    var ms = await client.OpenWriteAsync(true);
    await JsonSerializer.SerializeAsync(ms, data);
}

The only difference between this and the previous code is that we are using a new api of GetBlockBlobClient instead of GetBlobClient from pass 2. This api gives us stream to write data directly into. And voila just 1x memory and it is uploaded to blob storage

With that we are finally at the most performant code that I can think of ! Let me know if there is something even better than this

Here is the final code for both read and write

using System.Text.Json;
using Azure.Storage.Blobs.Specialized;

public async Task<T> ReadDataAsync (string blobId, CancellationToken c)
{
    var client = containerClient.GetBlobClient(blobId);
    using var stream = await client.OpenReadAsync(null, c);
    return await JsonSerializer.DeserializeAsync<T>(stream, null, c);
}

public async Task WriteAsync<T> (T data, string blobId)
{
    var client = containerClient.GetBlockBlobClient(blobId);
    var ms = await client.OpenWriteAsync(true);
    await JsonSerializer.SerializeAsync(ms, data);
}