Q & A: Vocal Isolation

Danzibar writes: 

Hello!
How can I isolate vocals from a song? Im using Cubase sx, but have very basic knowledge of the programme so far. Its a very simple song, vocals and mbira. I want to get rid of the mbira and use the vocal track.
Help would be much appreciated.
Cheers.


adminlogo: Thanks for writing in Danzibar, and thats a really common question we sound guys get.  Unfortunately, the answer comes with lots of “ifs and buts.”

Short Answer: 

First of all, as far I’m aware, there aren’t any “perfect” methods of extracting vocals from a mix.  There are plugins and processes that try to separate the audio in the middle and the audio on sides.  This kind of processing is called “Mid-Side” processing, (different from, but related to Mid-Side recording).  This will help you extract a vocal if that vocal happens to be the only/loudest track centered in the mix.

The catch is, you’ll generally still get a lot of bleed, and of course if not done right, can create all sorts of funky artifacting/distortion.  But its kind of the “best” option out there for now.

Here’s a couple of free things that might help you on your way.

Brainworx makes a free plugin called “Solo” which lets you, well, “solo” the mid and sides of your track.  Try throwing that on a track and soloing your mid.  Then throw on some EQ to try to cut out the extreme highs and lows to lover the volume of the remaining bleed.  Plugin formats (VST, AU, RTAS, TDM)

Blue Cat Audio makes a cool set of plugs, one of which is a mid-side gain plugin.  If you try this one, You can “play” with how much mid and side you get by adjusting the gains.  Plugin formats (VST, AU, DX) More on that here.

Oh and check out their description of mid-side processing here.

Here’s an Example:

The Original Stereo Mix

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

The Sides Channel – What you’re trying to remove.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

The Mid Channel – The sides removed from the stereo mix, High Pass Filtered above 150 Hz.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Notice how on the Mid channel, you’ve still got a fair amount of snare and hat, and you can still make out some bass/kick.  This is because those tracks were also centered in the mix, so this kind of processing won’t entirely get rid of them. However, I was relatively successful in raising the vocal channel above a lot of the other elements in the mix, and this kind of processing really requires tweaking.  I did the above example in about ten minutes.

 

***

(Much) Longer Answer: 

The basic idea behind most vocal extraction plugins is to use the nature of stereo to your advantage.  If you have a stereo audio file and a vocal is panned completely center, then it theoretically has equal level in both the left and right channels.  If that vocal is panned all the way to the left, or hard left, then it is roughly has full volume on the left and is not present in the right.  Panning hard right reverses the situation.  

Nerd Note: This is roughly true, as panning is handled differently by different hardware/software, and a lot of panning mechanisms aren’t actually linear in scale. More on that here.

But how does that help us with vocals extraction?  In theory, there are ways to “separate” the audio that is different between the L and R channels from audio that is the same in both channels.  That might sound a bit confusing, but think of it as being able to take all the things from the “sides” and separate them from things that are in the “center”.  

Great, that’s exactly what we want right? Well, it doesn’t generally work perfectly.  Most of the time, there is stuff other than your vocal in the center channel, and you often get some “bleed” into your center channels.

There’s actually a great article on the nitty gritty of this process, here. It actually explains a lot about how FM stereo signals work, which is basically what this processing is based on.

After you’ve Isolated the “mid” channel, or the center, you should be left with your main vocal, and anything else that was panned center in the mix. Usually that includes bass guitar, kick drum, snare drum, and some other melodic tracks.  

Now comes the part that takes a little skill, tweaking, and patience.  You have to EQ and Gate the track to try to reduce the volume of things other than the vocal.  Start by High Pass Filtering the the track.  I’d suggest trying a pretty steep filter and the bringing it up while listening to the track.  Try to find a frequency between the kick drum/bass elements and the lower-end of your vocal.  I often find it somewhere between 150-250 Hz, but listen for yourself.  

If you have way too much high end in the Mid channel, you can try Low Pass Filtering, but this can make your vocal sound muted or unnatural, so at the very least, use a shallow slope (6-12 dB/octave).

Next, try a gate.  You’re going to try to set the threshold of the gate below the volume of the vocal but above the everything else.  This can get tricky, and you probably won’t be able to catch everything.  The goal here, is to let the gate “turn down” the parts between vocal lines, and leave the actual vocal parts untouched. 

Once the threshold is set, mess with the “range” or “reduction” and try something heavy first like 60dB, then bring it back to try to make the gating sound more natural.  Setting the attack/release/hold really requires some finesse, and i’d suggest a rather quick attack, and play with the hold and release till it sounds more natural.  

Lastly you can try some spectral editors. They let you see you audio in an FFT view, which s a fancy way of saying you can see Time (left-to-right), Frequency (top-to-bottom), and Volume (either shades of color or brightness) all in one view.

Spectral view of audio in iZotope Rx.

Spectral view of audio in iZotope Rx.

I loaded the Mid channel in to my spectral editor, and zoomed to a phrase with a snare drum hit in it. Then I select the snare drum hit.

 

Selecting a snare drum hit.

Selecting a snare drum hit.

Then I processed it using “spectral repair.”  Some editors use “repair” or “remove event.”  In  any case it is a function that tries to remove a transient sound (like a drum hit) and overwrite it using the audio before and after the hit.  It kind of works like Photoshop’s smudge tool if you’re familiar with that.

 

Before the Spectral Repair.

Before the Spectral Repair.

After the Spectral Repair.

After the Spectral Repair.

I use iZotope Rx which is kind of an investment, but works great.  There are other ones out there that you can try, though I’m not familiar with too many.  I’ll try to add a list of some soon.

Share This:
  • Digg
  • Facebook
  • Google Bookmarks
  • Reddit
  • Technorati
  • email
  • FriendFeed
  • MySpace
  • RSS

21 Responses to “Q & A: Vocal Isolation”

Leave a Reply

Polls

Which DAW do you use Primarily?

View Results

Loading ... Loading ...

Switch to our mobile site