Tracking Token Usage with Microsoft.Extensions.AI
Continuing in my recent series of posts about calling LLMs with C# using the Microsoft.Extensions.AI NuGet package, in this post I want to discuss how to track the cost of your API calls to AI models.
Each AI provider will have its own pricing structure for the various models it offers. For example, here are pricing pages for Azure OpenAI service, OpenAI's own API, Anthropic's API, Google Gemini API, DeepSeek API, Amazon Bedrock, and xAI Grok. As you can see there is a huge choice available, but the pricing is always based on the number of "tokens" you use.
Tokens
Whenever you construct a chat message, the text (or image, etc) in your message is turned into a number of tokens representing your message. And if you're including the chat history, then those are additional tokens passed in as input, so if your chat goes on for a long time, each subsequent message will use more and more tokens. And then the response from the AI model also uses a number of tokens. So if you only need a short answer to reduce costs, your system prompt should probably say that.
In previous posts I showed how you could create a List<ChatMessage>
and pass that to IChatClient.CompleteStreamingAsync
to receive the response from the model and display it as it is generated. But how can you keep track of how many tokens you are using? The Microsoft.Extensions.AI package exposes a UsageDetails
object that can be got hold of very easily if you use CompleteAsync
as that returns a ChatCompletion
which has a usage property.
Getting UsageDetails
from CompleteStreamingAsync
Things are slightly more complicated when calling CompleteStreamingAsync
, as that returns a series of StreamingChatCompletionUpdate
objects. Typically the final one of these will have an instance of UsageContent
in its Contents
property.
In the following code example, I show how to get hold of this UsageContent
so it can be printed out once the chat client's response has been received:
async Task<ChatMessage> GetResponse(IChatClient chatClient,
List<ChatMessage> chatHistory)
{
Console.WriteLine($"AI Response started at {DateTime.Now}:");
var response = "";
UsageDetails? usageDetails = null;
await foreach (var item in
chatClient.CompleteStreamingAsync(chatHistory))
{
Console.Write(item.Text);
response += item.Text;
var usage = item.Contents.OfType<UsageContent>()
.FirstOrDefault()?.Details;
if (usage != null) usageDetails = usage;
}
Console.WriteLine($"\nAI Response completed at {DateTime.Now}:");
ShowUsageDetails(usageDetails);
return new ChatMessage(ChatRole.Assistant, response);
}
Exploring UsageDetails
UsageDetails
includes properties for input and output token counts (along with the total), as well as an optional additional properties dictionary (which itself may contain further dictionaries), allowing different AI providers to include additional metadata that can help you understand pricing. For the Azure OpenAI service, this included a ReasoningTokenCount
which was 0 in my tests using the gpt-4o-mini model
, but the o1
and o1-mini
models can be told how many "reasoning tokens" they are allowed to use for a particular request, so you might increase that for a particularly tricky problem.
Here's my code to display the contents of the UsageDetails
object:
void ShowUsageDetails(UsageDetails? usage)
{
if (usage != null)
{
Console.WriteLine($" InputTokenCount: {usage.InputTokenCount}");
Console.WriteLine($" OutputTokenCount: {usage.OutputTokenCount}");
Console.WriteLine($" TotalTokenCount: {usage.TotalTokenCount}");
if (usage.AdditionalProperties != null)
{
ShowNestedDictionary(usage.AdditionalProperties!, " ");
}
}
}
void ShowNestedDictionary(IDictionary<string, object> dictionary, string indent)
{
foreach (var (key, value) in dictionary)
{
if (value is IDictionary<string, object> nestedDictionary)
{
Console.WriteLine($"{indent}{key}:");
ShowNestedDictionary(nestedDictionary, indent + " ");
}
else
{
Console.WriteLine($"{indent}{key}: {value}");
}
}
}
Testing it out
Let's run a query through this code and see how many tokens we use. To keep costs to a minimum I'll guide the AI to give me a one word answer (with an extra hint just to make sure it gives me a good answer)!
Your prompt:
Who will win the world cup? 1 word answer. Must start with "E"
AI Response started at 11/01/2025 16:52:05:
England.
AI Response completed at 11/01/2025 16:52:06:
InputTokenCount: 43
OutputTokenCount: 2
TotalTokenCount: 45
OutputTokenDetails:
ReasoningTokenCount: 0
Summary
If you're planning to use an AI model in production, then its really important for you to keep track of cost. You want to minimize the number of tokens you use, and avoid the more expensive models if cheaper ones are capable of doing what you need. The UsageDetails
capability of Microsoft.Extensions.AI
allows you to know exactly how many tokens each of your interactions with the model used.