Varispeed Playback with NAudio using SoundTouch
In this post I will demonstrate how you can implement varispeed playback with NAudio using the excellent SoundTouch library. To do so, I’ve prepared a very simple Windows Forms Application that lets you load an audio file and play it at varying speeds.
The key parts of the application are as follows. First of all, the SoundTouchInterop32.cs
and SoundTouchInterop64.cs
files include the necessary PInvoke signatures to call the x86 and x64 versions of SoundTouch respectively. These are SoundTouch.dll
and SoundTouch_x64.dll
. Make sure these DLLs are available in the path when running.
I’ve also created a wrapper class called SoundTouch
which simplifies access to SoundTouch and calls the correct PInvoke method depending on whether the process is 64 bit or not.
The next part is VarispeedSampleProvider
. This implements NAudio’s ISampleProvider
interface so it can be easily inserted into a signal chain. It also exposes a PlaybackRate
property which can be set to 1.0 for regular speed and 2.0 for 2x playback etc.
public VarispeedSampleProvider(ISampleProvider sourceProvider,
int readDurationMilliseconds, SoundTouchProfile soundTouchProfile)
The constructor for VarispeedSampleProvider
takes a source provider, which is the input stream you want to speed up (in our example an audio file), the readDurationInMilliseconds
, which allows control of how much will be read from the source provider in a single read. Obviously when you are speeding up or slowing down, the amount of audio you need to read from the source is different from the amount of time taken to play that audio. Something like 100ms will be fine to use here.
Finally, the SoundTouchProfile
allows us to specify what SoundTouch options we want to use. There are a few switches you can experiment with which adjust quality and performance. The most significant is UseTempo
. In tempo mode, SoundTouch will pitch compensate when you change speed, so the music will remain at the same pitch, just a different speed. This avoids the “chipmunk effect” when you play back at higher speeds. In the demo app I let you switch between modes while playback is stopped.
All that remains is to create an AudioFileReader
, pass it into the VarispeedSampleProvider
, and then pass that to the output device (WaveOutEvent
in our example) to be played.
The one other thing worth mentioning in the demo project is that I allow you to reposition in the file, and when you do so, it’s a good idea to tell SoundTouch that a reposition has taken place so internal buffers can be flushed. This is done by calling Reposition
on the VarispeedSampleProvider
.
Want to see the code? You can access it on GitHub.
Comments
Hi Mark
Tgd87I am building a protoype lip-sync application for animation.
I'd like to adjust the speed and tempo of segments (based on milliseconds) within a single audio.
I am trying to get your varispeed code to work with the concatenation class without much luck. This might not even be the best way to achieve what I am after.
May I ask what you would suggest I look at to varispeed adjust millisecond-based segments within a single audio file?
Thanks for your time. Please let me know if I can provide any further information.
Background
I have an audio file of spoken words and have aligned the text with the audio so I have a list of words with start and end times in milliseconds.
I'd like to be able to process each word's pitch and tempo using the millisecond timings from the alignment.
I understand that if I change the timing of one segment A then I need to add/subtract the final segment time for A from the positions of the next segments B, C, D, etc. This is a simple recalculating step I beleive.
Cheers
I'd make my own custom IWaveProvider for something like this. That way you can control exactly how many bytes are passed through the VarispeedWaveProvider, before changing the pitch settings to other values.
Mark HeathGreat thanks for your reply Mark. I'll give that a shot
Tgd87Cheers
The following line of code in class VarispeedSampleProvider throws two errors on build:
ScooterGirlpublic WaveFormat WaveFormat => sourceProvider.WaveFormat;
Errors:
; expected
Invalid token ';' in class, struct, or interface member declaration
Any ideas?
what version of Visual Studio are you using?
Mark HeathVS2013. I’m thinking I should be in 2015 or greater?
ScooterGirlYes, I'd recommend going to 2017, there's no good reason not to, and you can use the latest C# features
Mark HeathHello Mark, may I use this useful SampleProvider on my own Open Source project... maybe also doing some modifications?
Zamof course, glad its of use to you
Mark HeathThanks!
ZamAlso gratz for the great work
Thanks for this tutorial! Please excuse the stupid question, but do you know how it would be possible to load a WAV file instead of MP3?
blahblahblahblahEDIT:
Nevermind, I just needed to create a WaveStream first:
WaveStream waveStream = new WaveFileReader(filePath);
WaveChannel32 inputStream = new WaveChannel32(waveStream);
Just use AudioFileReader for a simpler way to do this
Mark HeathWhat do you mean? i wanted to do the same thing...
FlusherCheeseHi Mark, I'm using your code in an app that processes educational audio files (e.g. to automatically repeat sections, or play sections more slowly, etc)
MijinIt's all good, except that the sound quality is quite poor when slowing down audio; it becomes very choppy; it sounds like some kind of unintentional reverb.
Any clues? From initial debugging it looks like the issue is in the actual SoundTouch library, not your wrapper.
Yes, unfortunately it's very hard to generate natural sounding slowed down audio. You'll hear similar artefacts slowing down a YouTube video below 50%
Mark HeathThanks for the reply.
MijinI would say though that YouTube-quality audio slowdown would be fantastic. The results I'm getting right now are definitely inferior; even slowing down to 80% speed already results in audio that you need to strain to clearly hear exactly what's been spoken.
OK, I just tried randomly screwing around with the SoundTouch settings (without entirely knowing what I was doing).
MijinI found that reducing the SequenceMS parameter greatly improves the clarity of slowed down speech.