Posted in:

In this post I am going to explain how the NAudio MediaFoundationEncoder class can be used to convert WAV files into other formats such as WMA, AAC and MP3. And to do so, I'll walk you through a real-world example of some code I created recently that uses it.

The application is my new Skype Voice Changer utility, and I wanted to allow users to save their Skype conversations in a variety of different formats. The WavFormat that Skype uses is 16kHz 16 bit mono PCM, and I capture the audio directly in this format, before converting it to the target format when the call finishes.

The input to the MediaFoundationEncoder doesn't actually have to be a WAV file. It can be any IWaveProvider, so there is no need to create a temporary WAV file before the encoding takes place.

Initialising Media Foundation

The first step is to make sure Media Foundation is initialised. This requires a call to MediaFoundation.Startup(). You only need to do this once in your application, but it doesn’t matter if you call it more than once. Note that Media Foundation is only supported on Windows Vista and above. This means that if you need to support Windows XP, you will not be able to use Media Foundation.

Determining Codec Availability

Since I planned to make use of whatever encoders are available on the user’s machine, I don't need to ship any codecs with my application. However, not all codecs are present on all versions of Windows. The Windows Media Audio (and the Windows Media Voice) codec, are unsurprisingly present on all the desktop editions of Windows from Vista and above. Windows 7 introduced an AAC encoder, and it was only with Windows 8 that we finally got an MP3 encoder (although MP3 decoding has been present in Windows for a long time). There are rumours that a FLAC encoder will be present in Windows 10.

For server versions of Windows, the story is a bit more complicated. Basically, you may find you have to install the "Desktop Experience" before you have any codecs available.

But the best way to find out whether the codec you want is available is simply to ask the Media Foundation APIs whether there are any encoders that can target your desired format for the given input format.

MediaFoundationEncoder includes a useful helper function called SelectMediaType which can help you do this. You pass in the MediaSubtype (basically a GUID indicating whether you want AAC, WMA, MP3 etc), the input PCM format, and a desired bitrate. NAudio will return the "MediaType" that most closely matches your bitrate. This is because many of these codecs offer you a choice of bitrates so you can choose your own trade-off between file size and audio quality. For the lowest bitrate available, just pass in 0. For the highest bitrate, pass in a suitably large number.

So for example, if I wanted to see if I can encode to WMA, I would pass in an audio subtype of WMAudioV8 (this selects the right encoder), a WaveFormat that matches my input format (this is important as it includes what sample rate my input audio is at - encoders don't always support all sample rates), and my desired bitrate. I passed in 16kbps to get a nice compact file size.

var mediaType = MediaFoundationEncoder.SelectMediaType(
                    AudioSubtypes.MFAudioFormat_WMAudioV8, 
                    new WaveFormat(16000, 1), 
                    16000); 

if (mediaType != null) // we can encode… 

What about MP3 and AAC? Well you might think that the code would be the same. Just pass in MFAudioFormat_MP3 or MFAudioFormat_AAC as the first parameter. The trouble is, if we do this, we get no media type returned, even on Windows 8 which has both an MP3 encoder and an AAC encoder. Why is this? Well it's because the MP3 and AAC encoders supplied with Windows don't support 16kHz as an input sample rate. So we will need to upsample to 44.1kHz before passing it into the encoder. So now let's ask Windows if there is an MP3 encoder available that can encode mono 44.1kHz audio, and just request the lowest bitrate available:

mediaType = MediaFoundationEncoder.SelectMediaType(
            AudioSubtypes.MFAudioFormat_MP3, 
            new WaveFormat(44100,1), 
            0); 

Now (on Windows 8 at least) we do get back a media type, and it has a bitrate of 48kbps. The same applies to AAC - we need to upsample to 44.1kHz first, and the AAC encoder provided with Windows 7 and above has a minimum bitrate of 96kbps.

Performing the Encoding

So, assuming that we've successfully got a MediaType, how do we go about the encoding? Well thankfully, that's the easy bit. So for example, if we had selected a WMA media type, we could encode to a file like this:

using (var enc = new MediaFoundationEncoder(mediaType)) 
{ 
    enc.Encode("output.wma"), myWaveProvider) 
} 

In fact, to make things even simpler, MediaFoundationEncoder includes some helper methods for encoding to WMA, AAC and MP3 in a single line. You specify the wave provider, the output filename and the desired bitrate:

MediaFoundationEncoder.EncodeToMp3(myWaveProvider, 
                        "output.mp3", 48000); 

Creating your Pipeline

But of course the bit I haven't explained is how to set up the input stream to the encoder. This will need to be PCM (or IEEE float), and as we mentioned, it should be at a sample rate that the encoder supports. Here's an example of encoding a WAV file to MP3, but remember that the input WAV file will need to be 44.1kHz or 48kHz for this to work.

using (var reader = new WaveFileReader("input.wav")) 
{ 
    MediaFoundationEncoder.EncodeToMp3(reader, 
            "output.mp3", 48000); 
} 

But that was a trivial example. In a real world example, such as my Skype Voice Changer application, we have a more complicated setup. First, we open the inbound and outbound recording files with WaveFileReader. Then we mix them together using a MixingSampleProvider. Then, since I limit unregistered users to 30 seconds of recording, we optionally need to truncate the length of the file (I do this with the OffsetSampleProvider, and using the Take property). Then, if they selected MP3 or AAC we need to resample up to 44.1kHz. Since we’re already working with Media Foundation, we’ll use the MediaFoundationResampler for this. And finally, I go back down to 16 bit before encoding using a SampleToWaveProvider16 (although this is not strictly necessary for most Media Foundation encoders).

// open the separate recordings 
var incoming = new WaveFileReader("incoming.wav"); 
var outgoing = new WaveFileReader("outgoing.wav"); 

// create a mixer (for 16kHz mono) 
var mixer = new MixingSampleProvider(
                WaveFormat.CreateIeeeFloatWaveFormat(16000,1)); 

// add the inputs - they will automatically be turned into ISampleProviders 
mixer.AddMixerInput(incoming); 
mixer.AddMixerInput(outgoing); 

// optionally truncate to 30 second for unlicensed users 
var truncated = truncateAudio ? 
                new OffsetSampleProvider(mixer) 
                    { Take = TimeSpan.FromSeconds(30) } : 
                (ISampleProvider) mixer; 

// go back down to 16 bit PCM 
var converted16Bit = new SampleToWaveProvider16(truncated); 

// now for MP3, we need to upsample to 44.1kHz. Use MediaFoundationResampler 
using (var resampled = new MediaFoundationResampler(
            converted16Bit, new WaveFormat(44100, 1))) 
{ 
    var desiredBitRate = 0; // ask for lowest available bitrate 
    MediaFoundationEncoder.EncodeToMp3(resampled, 
                    "mixed.mp3", desiredBitRate); 
} 

Hopefully that gives you a feel for the power of chaining together IWaveProvider’s and ISampleProvider’s in NAudio to construct complex and interesting signal chains. You should now be able to encode your audio with any Media Foundation encoder present on the user’s system.

Footnote: Encoding to Streams

One question you may have is "can I encode to a stream"? Unfortunately, this is a little tricky to do, since NAudio takes advantage of various "sink writers" that Media Foundation provides, which know how to correctly create various audio container file formats such as WMA, MP3 and AAC. It means that the MediaFoundationEncoder class for simplicity only offers encoding to file. To encode to a stream, you'd need to work at a lower level with Media Foundation transforms directly, which is quite a complicated and involved process. Hopefully this is something we can add support for in a future NAudio.

Want to get up to speed with the the fundamentals principles of digital audio and how to got about writing audio applications with NAudio? Be sure to check out my Pluralsight courses, Digital Audio Fundamentals, and Audio Programming with NAudio.

Comments

Comment by Brian Lorraine

I have a variable number of wav files going into a mixer like in your example and I'm trying to get the mixer to spit out an MP3 file (this is a C# asp.net application using the latest naudio). All wav files are 44100,2channels,16bit
var mixer = new MixingSampleProvider(WaveFormat.CreateIeeeFloatWaveFormat(44100, 2));
...
mixer.AddMixerInput(stream1); //repeat a variable number of times
...
MediaFoundationApi.Startup();
var converted16Bit = new SampleToWaveProvider16(mixer);
using (var resampled = new MediaFoundationResampler(converted16Bit, new WaveFormat(44100, 2)))
{
var desiredBitRate = 0;
MediaFoundationEncoder.EncodeToMp3(converted16Bit, mymainfile, desiredBitRate);
}
So this KIND OF works with one problem. The page executes this code and then it "freezes" (the web page just spins). So I go to the folder where the output file "mymainfile" is supposed to be created while it's spinning. I see the file I created. It SAYS that the mp3 file is only either 0k or 187KB or sometimes around 400K. When i stop debugging, then suddenly the mp3 file is like 20-30 megs... its like 20 minutes long. My wav files were only a few seconds and it starts with those files mixed together and then just 20 minutes of silence. If I debug it just freezes on this last interior line "..EncodeToMp3..."
So i take it this mixer just keeps going. Is there a way to explicitly tell it to stop (after a certain number of bytes?)

Brian Lorraine
Comment by Mark Heath

Yes, the mixer can auto-stop when its last input has finished. It does this so long as ReadFully (badly named I know) is set to false (which is the default).
You must also make sure that the inputs passed to the mixer are not never-ending, or that will cause the mix to be never-ending as well.

Mark Heath
Comment by Brian Lorraine

Well, they shouldn't be never-ending. They're just simple wav files.
So these are the streams that are getting added in the "mixer.AddMixerInput(stream1);" below:
---------------------------
string myoutfile1 = Server.MapPath("Audio/" + "test1_" + myid + ".WAV");
AudioFileReader reader1a;
reader1a = new AudioFileReader(myoutfile1);
reader1a.Volume = float.Parse(volval.Text);
WaveChannel32 stream1 = new WaveChannel32(reader1a);
---------------------------
These source wav files themselves were also created programmatically with nAudio using a WaveFileWriter. I wasn't sure if there was some kind of optional (but recommended) meta/header information I needed to manually, programmatically add to the originall source wav file like declaring the # of bytes that the file should be... that might be causing it to keep reading.

Brian Lorraine
Comment by Mark Heath

OK, I see you're using `WaveChannel32`. That's a really old class and will return never-ending audio. Generally I recommend using `MixingSampleProvider` fed by `AudioFileReader` for this use. But if you set `PadWithZeroes = false` on your `WaveChannel32` that should also fix this issue

Mark Heath
Comment by Brian Lorraine

Also, just want to thank you for the time you took to create nAudio. For the last two years I've wanted to write a mobile music sampling app, and I wanted to write it in C#. I tried things like Xamarin Studio, but it couldn't keep up with the latest MS stuff I needed.
Eventually I decided to just work with what I knew: So I went with the web (ASP.NET/Windows Azure: Server with desktop experience, SQL Server back-end)... and made a sampler/drum-machine site that basically includes source control and custom uploads... Sort of an homage to the old demo-scene/mod/s3m Tracker days. As far as .NET audio libraries go, yours made it 100x easier than the others I tried.
http://doomloops.cloudapp.net
It's got a LONG way to go. I've barely just scratched the surface of the nAudio stuff. So far, it's awesome.

Brian Lorraine
Comment by Mark Heath

thanks Brian, looks awesome what you've built so far

Mark Heath
Comment by Brian Lorraine

Thanks!!!

Brian Lorraine
Comment by Mamadou Bah

Hi Mark,
I wonder how to ship codecs with my application (azure function app)

Mamadou Bah
Comment by Mark Heath

Unfortunately it's very hard to do that. You can't install codecs onto an Azure App Service plan. So it depends on what codecs you need to use - if they are available as DLLs you can reference.

Mark Heath
Comment by Mamadou Bah

Thanks Mark, for your reply.
The azure function works perfect for audios recorded on android and iOS (3gp and aac) devices but for web Chrome (.webm) or FireFox(.ogg) formats, it just never not work.
No audio file reader is able to open it, I tried AudioFileReader or MediaFoundationReader. both do not support it.
On my local machine, Chrome (.webm works), so that lead me to think, may be I can grab the codecs from my machine and ship them with the app by putting them in bin folder.
I just need codecs to support one or both of these two browser audio formats so I can actually demo my app. I have exhausted so much efforts to make it work in vain. If I can get even Chrome to work, it would be a great relief. Thanks again for your time.

Mamadou Bah
Comment by Mark Heath

Those formats typically aren't supported on Windows. One option is to transcode with something like ffmpeg. I have done that in the past on Azure Functions, but bear in mind that on the Consumption plan you're limited to a five minute function execution so not enough time to do a large transcode.

Mark Heath
Comment by Mamadou Bah

Ok, thanks for that tip on consumption plan. I'm not familiar with audio transcoding. Is that a command line tool to do audio conversion? I will research it then and see if that will solve my problem.

Mamadou Bah
Comment by Mark Heath

yes, and there are other tools like sox that can do command line audio conversion. Obviously there is Azure Media Services which can do transcoding, but that is a bit more involved to configure

Mark Heath
Comment by Mamadou Bah

Thanks a lot Mark, information has been helpful and will allow me to have a narrowed, focused and guided research effort. Thank you.

Mamadou Bah
Comment by Joseph Glass

Hi Mark,
I'm trying to use NAudio to get an audio stream from a video file, sample this stream, and generate waveform data from it. I've tried to read the bytes from a video file using both MediaFoundationReader and BufferedWaveProvider, and I have tried encoding this byte array to Mp3 with MediaFoundationEncoder. At this last step, the job hangs indefinitely. Do you have any advice on how I should be going about this task?
Thank you for your time,
Joseph

Joseph Glass
Comment by Anderson Nunes

Hi Mark,
I'm building an Audio Logger for my radio broadcast software suite, codename FREEWAVE, ans i'm porting from other lib to NAudio, as other modules. SO, I use MediaFoundation to do it. I record in a WAV file, and every hour of the day encode to MP3 or other format ( I prefer OPUS, but don't know how customize MF to use it yet).
O have three questions:
1. How i can set the bitrate (ex. mp3)? Even i set to 64000, the file is encoded with about 100kbps.
2. Some guide how use OPUS codec with MF to encode for file?
3. I can encode the wave sampler or buffer on the fly direct to disk (no record WAV file and after encode)?
Thank you so much and God bless you for your brilliant work with NAudio. 🙏

Anderson Nunes