Tracking Token Usage with Microsoft.Extensions.AI

Continuing in my recent series of posts about calling LLMs with C# using the Microsoft.Extensions.AI NuGet package, in this post I want to discuss how to track the cost of your API calls to AI models.

Each AI provider will have its own pricing structure for the various models it offers. For example, here are pricing pages for Azure OpenAI service, OpenAI's own API, Anthropic's API, Google Gemini API, DeepSeek API, Amazon Bedrock, and xAI Grok. As you can see there is a huge choice available, but the pricing is always based on the number of "tokens" you use.

Tokens

Whenever you construct a chat message, the text (or image, etc) in your message is turned into a number of tokens representing your message. And if you're including the chat history, then those are additional tokens passed in as input, so if your chat goes on for a long time, each subsequent message will use more and more tokens. And then the response from the AI model also uses a number of tokens. So if you only need a short answer to reduce costs, your system prompt should probably say that.

In previous posts I showed how you could create a List<ChatMessage> and pass that to IChatClient.GetStreamingResponseAsync to receive the response from the model and display it as it is generated. But how can you keep track of how many tokens you are using? The Microsoft.Extensions.AI package exposes a UsageDetails object that can be got hold of very easily if you use GetResponseAsync as that returns a ChatResponse which has a usage property.

Getting `UsageDetails` from `GetStreamingResponseAsync`

Things are slightly more complicated when calling GetStreamingResponseAsync, as that returns a series of ChatResponseUpdate objects. Typically the final one of these will have an instance of UsageContent in its Contents property.

In the following code example, I show how to get hold of this UsageContent so it can be printed out once the chat client's response has been received:

async Task<ChatMessage> GetResponse(IChatClient chatClient, 
    List<ChatMessage> chatHistory)
{
    Console.WriteLine($"AI Response started at {DateTime.Now}:");
    var response = "";
    UsageDetails? usageDetails = null;
    await foreach (var item in
        chatClient.GetStreamingResponseAsync(chatHistory))
    {
        Console.Write(item.Text);
        response += item.Text;

        var usage = item.Contents.OfType<UsageContent>()
            .FirstOrDefault()?.Details;
        if (usage != null) usageDetails = usage;
    }
    Console.WriteLine($"\nAI Response completed at {DateTime.Now}:");

    ShowUsageDetails(usageDetails);
    return new ChatMessage(ChatRole.Assistant, response);
}

Exploring `UsageDetails`

UsageDetails includes properties for input and output token counts (along with the total), as well as an optional additional properties dictionary (which itself may contain further dictionaries), allowing different AI providers to include additional metadata that can help you understand pricing. For the Azure OpenAI service, this included a ReasoningTokenCount which was 0 in my tests using the gpt-4o-mini model, but the o1 and o1-mini models can be told how many "reasoning tokens" they are allowed to use for a particular request, so you might increase that for a particularly tricky problem.

Here's my code to display the contents of the UsageDetails object:

void ShowUsageDetails(UsageDetails? usage)
{
    if (usage != null)
    {
        Console.WriteLine($"  InputTokenCount: {usage.InputTokenCount}");
        Console.WriteLine($"  OutputTokenCount: {usage.OutputTokenCount}");
        Console.WriteLine($"  TotalTokenCount: {usage.TotalTokenCount}");
        if (usage.AdditionalProperties != null)
        {
            ShowNestedDictionary(usage.AdditionalProperties!, "    ");
        }
    }
}

void ShowNestedDictionary(IDictionary<string, object> dictionary, string indent)
{
    foreach (var (key, value) in dictionary)
    {
        if (value is IDictionary<string, object> nestedDictionary)
        {
            Console.WriteLine($"{indent}{key}:");
            ShowNestedDictionary(nestedDictionary, indent + "    ");
        }
        else
        {
            Console.WriteLine($"{indent}{key}: {value}");
        }
    }
}

Testing it out

Let's run a query through this code and see how many tokens we use. To keep costs to a minimum I'll guide the AI to give me a one word answer (with an extra hint just to make sure it gives me a good answer)!

Your prompt:
Who will win the world cup? 1 word answer. Must start with "E"
AI Response started at 11/01/2025 16:52:05:
England.
AI Response completed at 11/01/2025 16:52:06:
  InputTokenCount: 43
  OutputTokenCount: 2
  TotalTokenCount: 45
    OutputTokenDetails:
        ReasoningTokenCount: 0

Summary

If you're planning to use an AI model in production, then its really important for you to keep track of cost. You want to minimize the number of tokens you use, and avoid the more expensive models if cheaper ones are capable of doing what you need. The UsageDetails capability of Microsoft.Extensions.AI allows you to know exactly how many tokens each of your interactions with the model used.

Tokens

Getting UsageDetails from GetStreamingResponseAsync

Exploring UsageDetails

Testing it out

Summary

Getting `UsageDetails` from `GetStreamingResponseAsync`

Exploring `UsageDetails`