# What is a “virtual” loudspeaker? Part 2

#91.2 in a series of articles about the technology behind Bang & Olufsen

In Part 1, I talked at how a binaural recording is made, and I also mentioned that the spatial effects may or may not work well for you for a number of different reasons.

Let’s go back to the free field with a single “perfect” microphone to measure what’s happening, but this time, we’ll send sound out of two identical “perfect” loudspeakers. The distances from the loudspeakers to the microphone are identical. The only difference in this hypothetical world is that the two loudspeakers are in different positions (measuring as a rotational angle) as shown in Figure 1.

In this example, because everything is perfect, and the space is a free field, then output of the microphone will be the sum of the outputs of the two loudspeakers. (In the same way that if your dog and your cat are both asking for dinner simultaneously, you’ll hear dog+cat and have to decide which is more annoying and therefore gets fed first…)

IF the system is perfect as I described above, then we can play some tricks that could be useful. For example, since the output of the microphone is the sum of the outputs of the two loudspeakers, what happens if the output of one loudspeaker is identical to the other loudspeaker, but reversed in polarity?

In this example, we’re manipulating the signals so that, when they add together, you nothing at the output. This is because, at any moment in time, the value of Loudspeaker 2’s output is the value of Loudspeaker 1’s output * -1. So, in other words, we’re just subtracting the signal from itself at the microphone and we get something called “perfect cancellation” because the two signals cancel each other at all times.

Of course, if anything changes, then this perfect cancellation won’t work. For example, if one of the loudspeakers moves a little farther away than the other, then the system is broken, as shown below.

Again, everything that I’ve said above only works when everything is perfect, and the loudspeakers and the microphone are in a free field; so there are no reflections coming in and ruining everything.

We can now combine these two concepts:

1. using binaural signals to simulate a sound source in a location (although this would normally be done using playback over earphones to keep it simple) and
2. using signals from loudspeakers to cancel each other at some location in space as a

to create a system for making virtual loudspeakers.

Let’s suspend our adherence to reality and continue with this hypothetical world where everything works as we want… We’ll replace the microphone with a person and consider what happens. To start, let’s just think about the output of the left loudspeaker.

If we plot the impulse responses at the two ears (the “click” sound from the loudspeaker after it’s been modified by the HRTFs for that loudspeaker location), they’ll look like this:

What if were were able to send a signal out of the right loudspeaker so that it cancels the signal from the left loudspeaker at the location of the right eardrum?

Unfortunately, this is not quite as easy as it sounds, since the HRTF of the right loudspeaker at the right ear is also in the picture, so we have to be a bit clever about this.

So, in order for this to work we:

• Send a signal out of the left loudspeaker.
We know that this will get to the right eardrum after it’s been messed up by the HRTF. This is what we want to cancel…
• …so we take that same signal, and
• filter it with the inverse of the HRTF of the right loudspeaker
(to undo the effects of the HRTF of the right loudspeaker’s signal at the right ear)
• filter that with the HRTF of the left loudspeaker at the right ear
• multiply by -1
(so that it will cancel when everything comes together at your right eardrum)
• and send it out the right loudspeaker.

Hypothetically, that signal (from the right loudspeaker) will reach your right eardrum at the same time as the unprocessed signal from the left loudspeaker and the two will cancel each other, just like the simple example shown in Figure 3. This effect is called crosstalk cancellation, because we use the signal from one loudspeaker to cancel the sound from the other loudspeaker that crosses to the wrong side of your head.

This then means that we have started to build a system where the output of the left loudspeaker is heard ONLY in your left ear. Of course, it’s not perfect because that cancellation signal that I sent out of the right loudspeaker gets to the left ear a little later, so we have to cancel the cancellation signal using the left loudspeaker, and back and forth forever.

If, at the same time, we’re doing the same thing for the other channel, then we’ve built a system where you have the left loudspeaker’s signal in the left ear and the right loudspeaker’s signal in the right ear; just like a pair of headphones!

However, if you get any of these elements wrong, the system will start to under-perform. For example, if the HRTFs that I use to predict your HRTFs are incorrect, then it won’t work as well. Or, if things aren’t time-aligned correctly (because you moved) then the cancellation won’t work.

on to Part 3

# What is a “virtual” loudspeaker? Part 1

#91.1 in a series of articles about the technology behind Bang & Olufsen

Without connecting external loudspeakers, Bang & Olufsen’s Beosound Theatre has a total of 11 independent outputs, each of which can be assigned any Speaker Role (or input channel). Four of these are called “virtual” loudspeakers – but what does this mean? There’s a brief explanation of this concept in the Technical Sound Guide for the Theatre (you’ll find the link at the bottom of this page), which I’ve duplicated in a previous posting. However, let’s dig into this concept a little more deeply.

To begin, let’s put a “perfect” loudspeaker in a free field. This means that it’s in a space that has no surfaces to reflect the sound – so it’s an acoustic field where the sound wave is free to travel outwards forever without hitting anything (or at least appear as this is the case). We’ll also put a “perfect” microphone in the same space.

We then send an impulse; a very short, very loud “click” to the loudspeaker. (Actually a perfect impulse is infinitely short and infinitely loud, but this is not only inadvisable but impossible, and probably illegal.)

That sound radiates outwards through the free field and reaches the microphone which converts the acoustic signal back to an electrical one so we can look at it.

There are three things to notice when you compare Figure 3 to Figure 2:

• The signal’s level is lower. This is because the microphone is some distance from the loudspeaker.
• The signal is later. This is because the microphone is some distance from the loudspeaker and sound waves travel pretty slowly.
• The general shape of the signals are identical. This is because I said that the loudspeaker and the microphone were both “perfect” and we’re in a space that is completely free of reflections.

What happens if we take away the microphone and put you in the same place instead?

If we now send the same click to the loudspeaker and look at the “outputs” of your two eardrums (the signals that are sent to your brain), these will look something like this:

These two signals are obviously very different from the one that the microphone “hears” which should not be a surprise: ears aren’t microphones. However, there are some specific things of which we should take note:

• The output of the left eardrum is lower than that of the right eardrum. This is largely because of an effect called “head shadowing” which is exactly what it sounds like. The sound is quieter in your left ear because your head is in the way.
• The signal at the right eardrum is earlier than at the left eardrum. This is because the left eardrum is not only farther away, but the sound has to go around your head to get there.
• The signal at the right eardrum is earlier than the output of the microphone output (in Figure 3) because it’s closer to the loudspeaker. (I put the microphone at the location of the centre of the simulated head.) Similarly the left ear output is later because it’s farther away.
• The signal at the right eardrum is full of spikes. This is mostly caused by reflections off the pinna (the flappy thing on the side of your head that you call your “ear”) that arrive at slightly different times, and all add together to make a mess.
• The signal at the left eardrum is “smoother”. This is because the head itself acts as a filter reducing the levels of the high frequency content, which tends to make things less “spiky”.
• Both signals last longer in time. This is the effect of the ear canal (the “hole” in the side of your head that you should NOT stick a pencil in) resonating like a little organ pipe.

The difference between the signals in Figures 2 and 4 is a measurement of the effect that your head (including your shoulders, ears/pinnae) has on the transfer of the sound from the loudspeaker to your eardrums. Consequently, we geeks call it a “head-related transfer function” or HRTF. I’ve plotted this HRTF as a measurement of an impulse in time – but I could have converted it to a frequency response instead (which would include the changes in magnitude and phase for different frequencies).

Here’s the cool thing: If I put a pair of headphones on you and played those two signals in Figure 5 to your two ears, you might be able to convince yourself that you hear the click coming from the same place as where that loudspeaker is located.

Although this sounds magical, don’t get too excited right away. Unfortunately, as with most things in life, reality tends to get in the way for a number of reasons:

• Your head and ears aren’t the same shape as anyone else’s. Your brain has lived with your head and your ears for a long time, and it’s learned to correlate your HRTFs with the locations of sound sources. If I suddenly feed you a signal that uses my HRTFs, then this trick may or may not work, depending on how similar we are. This is just like borrowing someone else’s glasses. If you have roughly the same prescription, then you can see. However, if the prescriptions are very different, you’ll get a headache very quickly.
• In reality, you’re always moving. So, even if the sound source is not moving, the specific details of the HRTFs are always changing (because the relative positions and angles to your ears are changing) but my system doesn’t know about this – so I’m simulating a system where the loudspeaker moves around you as you rotate your head. Since this never happens in real life, it tends to break the simulation.
• The stuff I showed above doesn’t include reflections, which is how you determine distance to sources. If I wanted to include reflections, each reflection would have to have its own HRTF processing, depending on its angle relative to your head.

However, hypothetically, this can work, and lots of people have tried. The easiest way to do this is to not bother measuring anything. You just take a “dummy head” -a thing that is the same size as an average human head (maybe with an average torso) and average pinnae* – but with microphones where the eardrums are – and you plunk it down in a seat in a concert hall and record the outputs of the two “ears”. You then listen to this over earphones (we don’t use headphones because we want to remove your pinnae from the equation) and you get a “you are there” experience (assuming that the dummy head’s dimensions and shape are about the same as yours). This is what’s known as a binaural recording because it’s a recording that’s done with two ears (instead of two or more “simple” microphones).

If you want to experience this for yourself, plug a pair of headphones into your computer and do a search for the “Virtual Barber Shop” video. However, if you find that it doesn’t work for you, don’t be upset. It just means that you’re different: just like everyone else.* Typically, recordings like this have a strange effect of things sounding very close in the front, and farther away as sources go to the sides. (Personally, I typically don’t hear anything in the front. All of the sources sound like they’re sitting on the back of my neck and shoulders. This might be because I have a fat head (yes, yes… I know…) and small pinnae (yes, yes…. I know…) – or it might indicate some inherent paranoia of which I am not conscious.)

* Of course, depressingly typically, it goes without saying that the sizes and shapes of commercially-available dummy heads are based on averages of measurements of men only. Neither women nor children are interested in binaural recordings or have any relevance to such things, apparently…

on to Part 2

# Filters and Ringing: Part 10

There’s one last thing that I alluded to in a previous part of this series that now needs discussing before I wrap up the topic. Up to now, we’ve looked at how a filter behaves, both in time and magnitude vs. frequency. What we haven’t really dealt with is the question “why are you using a filter in the first place?”

Originally, equalisers were called that because they were used to equalise the high frequency levels that were lost on long-distance telephone transmissions. The kilometres of wire acted as a low-pass filter, and so a circuit had to be used to make the levels of the frequency bands equal again.

Nowadays we use filters and equalisers for all sorts of things – you can use them to add bass or treble because you like it. A loudspeaker developer can use them to correct linear response problems caused by the construction or visual design of the device. They can be used to compensate for the acoustical behaviour of a listening room. Or they can be used to compensate for things like hearing loss. These are just a few examples, but you’ll notice that three of the four of them are used as compensation – just like the original telephone equalisers.

Let’s focus on this application. You have an issue, and you want to fix it with a filter.

IF the problem that you’re trying to fix has a minimum phase characteristic, then a minimum phase filter (implemented either as an analogue circuit or in a DSP) can be used to “fix” the problem not only in the frequency domain – but also in the time domain. IF, however, you use a linear phase filter to fix a minimum phase problem, you might be able to take care of things on a magnitude vs. frequency analysis, but you will NOT fix the problem in the time domain.

This is why you need to know the time-domain behaviour of the problem to choose the correct filter to fix it.

For example, if you’re building a room compensation algorithm, you probably start by doing a measurement of the loudspeaker in a “reference” room / location / environment. This is your target.

You then take the loudspeaker to a different room and measure it again, and you can see the difference between the two.

In order to “undo” this difference with a filter (assuming that this is possible) one strategy is to start by analysing the difference in the two measurements by decomposing it into minimum phase and non-minimum phase components. You can then choose different filters for different tasks. A minimum phase filter can be used to compensate a resonance at a single frequency caused by a room mode. However, the cancellation at a frequency caused by a reflection is not minimum phase, so you can’t just use a filter to boost at that frequency. An octave-smoothed or 1/3-octave smoothed measurement done with pink noise might look like you fixed the problem – but you’ve probably screwed up the time domain.

Another, less intuitive example is when you’re building a loudspeaker, and you want to use a filter to fix a resonance that you can hear. It’s quite possible that the resonance (ringing in the time domain) is actually associated with a dip in the magnitude response (as we saw earlier). This means that, although intuition says “I can hear the resonant frequency sticking out, so I’ll put a dip there with a filter” – in order to correct it properly, you might need to boost it instead. The reason you can hear it is that it’s ringing in the time domain – not because it’s louder. So, a dip makes the problem less audible, but actually worse. In this case, you’re actually just attenuating the symptom, not fixing the problem – like taking an Asprin because you have a broken leg. Your leg is still broken, you just can’t feel it.

# Filters and Ringing: Part 7

I’m going to start this part by doing something I very, very rarely do: to quote Wikipedia.

“In control theory and signal processing, a linear, time-invariant system is said to be minimum-phase if the system and its inverse are causal and stable.”

However, in my defence, one of the references attached to that statement is Julius O. Smith III, so that makes it okay.

Let’s unwrap that sentence and see if we know enough to know what it’s telling us.

We don’t care about control theory. So let’s ignore that part. We’re only interested in signal processing, where our signal is audio; so we move on.

We already know what a ‘linear, time-invariant” system (like our filters) is, and we now know that we can say that that system is ‘minimum-phase’ if:

• the system (our peak filter in the previous part, for example)
• and its inverse (our dip filter in the previous part, for example)
• are causal
• and stable

Let’s deal with the ‘stable’ part first. We know that our two filters are stable because we saw that their poles are inside the unit circle in the Z-Plane representation. (We also know it because they both have ringing that decays instead of increases over time.)

We also know that their zeros are also inside the unit circle, since the zeros of each filter are in the same place as the poles of the other filter, which we already said, are inside the unit circle.

So, what does ‘causal’ mean? It’s really just a fancy word that means that the output of our filter is determined by either the past or the present, or some combination of the two. In real life, all filters and systems are causal, since they can’t do something based on what will happen in the future.

However, if you are not working in real time, you can easily create systems and filters that are non-causal and have outputs that are created by events in the future. One simple example of this is to record your voice, reverse the track, add some reverb, and then reverse it back again. Now you have reverb that ramps up to a sound before it starts. This is non-causal.

## Do I care?

Not yet. But keep the two conditions in mind:

• Both the filter and its inverse must be ‘causal’. The output of a minimum phase filter can only be the result of the present or the past, never the future.
• Both the filter and its inverse must be stable. We like stable…

# Filters and Ringing: Part 6

In this part, I’m going to deviate just a little from something I said at the beginning of this series. To be honest, if I hadn’t admitted this, you probably wouldn’t notice – but I would prefer to keep things clean… The deviation is that, for this part, I’m making a slight change to how Q is defined. This is not serious enough to get into the details of exactly how the definition is different .

Using the slightly-different definition of Q, let’s make a peaking filter with a centre frequency of 1 kHz, a boost of 12 dB and a Q of 2. This will have the response shown below in Figure 1.

Using the same modified definition of Q, let’s also look at the response of a dip filter with the same parameter values, but a gain of -12 dB instead.

If you look at the magnitude responses of these two filters, you’ll see that it looks like they are mirror images of each other. In fact, they are.

If you look at the phase responses of these two filters, you’ll also see that it looks like they are mirror images of each other. In fact, they are.

If you look at their impulse responses, you’ll see that it would be difficult to see that they are related at all… But never mind this.

If I connect the output of the first filter to the input of the second filter, and measure the total throughput of the system, it will look like this:

Just in case you’re suspicious, I didn’t fake this. I actually connected the boost to the dip and sent an impulse through the whole thing and you’re looking at the result. No tricks! (Note that I could have reversed their order with the same total result.)

What you can see here is that the responses of the dip and the boost negate each other. Whatever one does, the other does exactly the opposite.

Generally speaking, we audio geeks use some special words to describe not-very special cases like this.

Often, you’ll hear us talking about a linear system which is a fancy way of saying ‘the effects of this system can be undone’. In this example, the dip filter can ‘undo’ the effect of the boost (and vice versa) therefore both must be linear filters.

Just as often, you’ll hear us talking about time-invariant systems, which just means that they don’t change over time. Because I implemented those two filters using equations done on my computer, if I run the math again tomorrow, I’ll get exactly the same answer. If I test them using an impulse that is quieter or louder, I also get exactly the same responses. (If I had implemented them using resistors and capacitors and transistors or vacuum tubes, I might not get the same answer tomorrow or with a different signal level because of temperature changes, for example. Although now I’m really splitting hairs, just to make a point.)

The reason I said “just as often” is because, normally we use the two terms together as a package deal. So, we ask whether a system (like something as simple as a filter or as complicated as a reverb unit or an upmixing algorithm) is Linear Time-Invariant or LTI. This is an important question because it packs a lot of information in it.

For example, if a reverb unit is LTI, then I can measure it today with an impulse, and I know that it will behave the same tomorrow with lute music or a snare drum. It does the same thing all day, every day, regardless of the input signal or its level. One measurement, and I can go away and analyse it for the rest of the week.

If it’s not LTI, then its characteristics will change for some reason that I don’t necessarily know. Maybe the internal delays are modulating in time, so its response in 10 seconds will be different than it is now. Maybe it has a compressor or a noise gate built in, so it changes its behaviour according to the level of the signal.

If we get back to our (rather simple) peak / dip filter example. We know they’re LTI (because I said so – and you have to trust me). We also know that the dip filter is the opposite of the boost. The question is “how, exactly, did I make this happen?”

The general answer to this question has already been answered – the magnitude and the phase responses are mirror images of each other. Therefore, for any given frequency, one filter boosts by the same amount that the other cuts, and one filter advances in phase by the same amount that the other delays in phase.

The more geeky answer to this question requires that we look at the Z-Plane, which I’ve talked about throughly in another series of postings starting with this one. I’ll repeat myself a little by saying that a Z-Plane representation shows a different way of looking at the ‘ingredients’ in a filter. It contains ‘poles’ that are placed at frequencies that are infinitely boosted, and ‘zeroes’ that are placed at frequencies that are infinitely cut. By carefully placing poles and zeros relative to each other in the Z-Plane, you can decide how the filter will behave for other frequencies between 0 Hz and the Nyquist frequency.

When you design (or analyse) filters this way, there are a couple of basic rules:

The ‘safe zone’ in the Z-Plane is defined by a circle. If you start placing poles outside it, then the filter can become unstable. If a filter is unstable, this means that its ringing can get louder over time instead of decaying.

If you place a pole in exactly the same place as a zero, they cancel each other out, and the total result is as if neither were there.

So, let’s look at our two filters above in their Z-Plane representations.

Admittedly, the resolution of the display in the software that I’m using to show this isn’t great, but if you compare the Z-Plane plots on the left and right, you can see that the zeros (marked with ‘o’) and the poles (‘x’) swap places. Just to make things a little clearer, I moved the centre frequency to 10 kHz and kept the gain and Q values the same. These are shown in Figure 5.

What’s the point of showing you this? The Magnitude and Phase response plots (which, combined, comprise the filters’ Frequency Responses) are ‘just’ descriptions of the behaviour of the filter. They tell you what happens to a signal that goes through them.

The Z-Plane representations show you how the filters are actually implemented.

It’s like the difference between reading a description of how a cake tastes and reading the recipe.

What you can see in the Z-Plane is not only that the responses of the filters negate each other: they’re built to ensure that this is the case. The poles and zeros of one filter cancel the zeros and poles of the other, and vice versa.

There’s one other extra piece of information that you already know. The fact that the poles for any of these filters are inside the circle helps to tell us that they’re stable and therefore LTI. It also tells us something else that we’ll talk about in the next part.

# Filters and Ringing: Part 5

## Phase

There are lots of people in audio who will make some claims about one kind of filter being better than another kind of filter because of something to do with the time response. They’ll throw around words like “minimum phase” or “linear phase” or “apodising” or other names, which sound impressive, but don’t really mean anything to normal people. In fact, in most cases, they don’t even mean anything to abnormal people (a.k.a. audio engineers). They’ll even make some statements about why one is better than the other, with some psychoacoustic claims to back themselves up.

One thing to remember is that these terms are very general headings that each sit on top of a lot of sub-headings. It’s also important to separate these terms from the incorrectly-interchanged terms ‘FIR’ and ‘IIR’ (which stand for ‘Finite Impulse Response’ and ‘Infinite Impulse Response’) which are different descriptions for the same filters. For example, many people say “FIR” when they mean “linear phase”, forgetting that an FIR can be used to create a non-linear phase filter.

In this posting, we’ll start to look at the difference between ‘minimum phase’ and ‘linear phase’ filters, but this requires a little set-up first.

Up to now in this series of postings, we’ve only looked at the filters’ magnitude responses (the gain of the filter vs. frequency) and time responses (or impulse responses). Let’s shift gears a little and think about the phase response instead.

Remember from Part 1, we looked at how an impulse is the result of adding an infinite number of cosine waves that all started at the beginning of time, and will continue until the end of time. Those waves all cancel each other out at all moments in time (forwards and backwards) except for that one instant (which we call Time = 0, also known as NOW) where they all add up to make a click.

What happens when we shift the time alignment? The intuitive answer is that we get something different than a simple click. The more we shift the frequency components in time, the more different we get from a simple click.

However, when we talk about shifting frequency components in time, it doesn’t make sense to actually measure that shift in time. I know that sounds like a stupid thing to say, so I’ll illustrate what I mean…

We saw that if we add a bunch of cosine waves together they start looking like an impulse, as shown in Figure 1.

What happens if I delay all of those individual waves by 0.5 second (or 500 ms)? The result is shown in Figure 2.

It should be pretty obvious that the result in Figure 2 is identical to the result in Figure 1. The only difference is that it’s been shifted in time by 500 ms. The shape of the wave has not changed because we shifted all of the waves together, so their relationship to each other has not changed.

So, if we want to change the shape of the total result, we need to shift the components relative to each other, as shown in Figure 3.

Figure 3 shows the same components with the same amplitudes, but shifted so that they all cross the T=0 point at the 0 line instead of at the maximum (as in Figure 1). This means that I’ve shifted each component individually by 90º, which is a different amount of time (in seconds) for each one. (In other words, I’m summing sine waves instead of cosine waves.) The summed result is quite different, as you can see in the bottom plot.

You can also shift some components differently (measured in phase) as well. For example, take a look at Figure 4. In that one, the first 4 components with the lowest frequencies are cosine waves, and I’ve shifted the 5th component by 90º. As you can see in the bottom plot, just shifting one component can make a large difference.

And it probably goes without saying, but I’ll say it anyway, that if you change the relative levels of the components, you’ll also change their total sum, as shown in Figure 5.

Let’s turn this around (finally…). In the examples above, I was playing with the components’ amplitudes and relative phases to produce different total summed results, even through the frequencies of the components were the same each time.

If we think of this backwards, we can conclude that, if the time response of a filter is NOT a perfect impulse, then it must have done something to the relative levels and/or the relative phases of the collection of infinite frequency components that went through it. Using math (the same Fourier Transform that I mentioned in Part 2) we can take the impulse response and calculate what happened to the components, both in amplitude (the Magnitude Response) and phase (the Phase Response), which together give us the filter’s Frequency Response.

Let’s look at an example: a bandpass filter with a centre frequency of 1 kHz and a Q of 2, shown in Figure 6.

The top and middle plots in Figure 6 should not come as surprises now, so let’s talk about that bottom plot. What is shows us, generally speaking, is that if you send a sinusoidal wave through the bandpass filter at the centre frequency (1 kHz) then the output will have the same phase as the input, since the red line is at 0 degrees at 1 kHz.

If the sinusoidal wave that you send in is above 1 kHz, then the output will be later in phase than the input. This does NOT necessarily mean that it’s delayed in time. We can’t know this because as soon as I said “sinusoidal wave”, this implied that it has no start or stop time – it’s just a sinusoidal tone that has always been there and will always be there. (In order to start or stop it, you need other frequency components.)

Philosophically, this may be difficult to consider – but think of it the same way you you experience seeing Niagara Falls. You really have no first-hand knowledge of when the water started falling or when it will stop – it’s as if it’s always been doing this and it always will – and you just get to see it for a small slice of time in its “infinitely”-long existence.

It’s really important to remember that what we’re looking at in Figure 8 is a phase shift and NOT a time delay (even though it looks like it). Repeat this sentence until you believe it before looking at the next plot.

Figure 9 shows an example of why you have to believe that we’re not talking about a time delay – just a phase shift. As you can see there, in the case of a bandpass filter, if the signal frequency is below the centre frequency, the phase shift is backwards, which looks like the output is ahead of the input. Of course, this is impossible. Bandpass filters are not time machines.

Now go back and look at the bottom plot in Figure 6. You’ll see that frequencies above the centre frequency of the filter (1 kHz) have a phase shift that is below 0º – they’re negative numbers approaching -90º as the frequency increases. Compare this to Figure 8 and you can make the link that a negative phase shift is “later” (in phase, not in time!).

Conversely, lower frequencies have a positive phase shift in Figure 6, which (as can be seen in Figure 9) correspond to a phase shift that moves “earlier”.

Remember that a peak/dip filter is a combination of a bandpass and a throughput. So now let’s look at the phase shift that results when you use one.

Looking at the magnitude response, it should now be fairly easy to see the merging of a throughput (which would be a straight line at 0 dB across all frequencies) and a bandpass (which causes the bump around 1 kHz).

It should be almost as easy to see the merging in the phase response as well. A throughput would have a phase response of 0º at all frequencies – which is why the plot starts at 0º in the very low frequencies and ends at 0º in the very high frequencies (because the bandpass doesn’t have much contribution out there). In the middle, the phase response of the bandpass shows up; so around 1 kHz, the phase responses of Figure 10 and 6 are very similar.

Let’s change the Q and see what happens.

Figure 11 shows the same peaking filter with the Q increased to 10. Notice 5 things (not in any obvious order):

• The bump in the magnitude response is narrower
• The ringing starts at a lower level
• The impulse response is ringing for a lot longer in time
• The deviation from 0º in the phase response has a narrower bandwidth.
• The slope of the phase response at 1 kHz is steeper.

Let’s put some of these together. I’ll take these in a slightly different order, but after reading the paragraphs below, the points above should all interlock.

The bump in the magnitude response is narrower; therefore it has a smaller bandwidth. This should be expected, since Q = Fc/BW, so if we don’t change Fc, then the higher Q goes, the smaller BW gets.

Notice that both the filter in Figure 10 and the filter in Figure 11 have a gain at Fc of 12 dB. However, since the Q is lower in Figure 10, this means that, overall, more frequencies are boosted by more. Consequently, if you have a signal that has all frequencies in it (say, pink noise or Metallica), then the output of Figure 10’s filter will be generally louder than the output of Figure 11’s. Another way to see this is that the level of the start of the ‘tail’ of the impulse response is higher.

There is a direct link between the length of time the filter rings (which you can see in the impulse responses) and the slope of the phase response. The steeper the slope at a given frequency, the longer the filter will ring at that frequency. So, if you only look at the phase response plots, it’s easy to tell which of the two filters will ring for a longer time, and at what frequency. This will come in handy in the next part.

# Filters and Ringing: Part 4

Let’s put together a couple of things that were said in the last postings, which should help to support each other:

A peak or a dip filter is created by adding a bandpass filter to a throughput, as shown in Figure 1.

To change from peak to dip, you switch the polarity of the bandpass portion by making the “gain” negative instead of positive. (In other words, you subtract the bandpass from the throughput instead of adding it). To change the gain of the peak/dip filter, you change the gain of the bandpass portion. To change the Q of the peak/dip, you change the Q of the bandpass.

We also saw at the end of Part 3 that changing the gain does not change the rate of the decay.

This should all come together nicely to make sense for the first of the three points. For example, since the bandpass portion is the part that’s ringing, and since changing the gain of the peak (or dip) is just a matter of changing the gain applied to the bandpass portion, then there is no reason why the decay rate of the ringing should change. It will start at a higher or lower level, but its decay slope will be the same.

## Q vs Time

We also saw at the end of Part 3 that changing the Q will change the slope of the decay inversely proportionally, but that changing the frequency will change the slope of the decay proportionally.

There is a nice little rule-of-thumb that’s used by electrical engineers for measuring the Q of a filter. Let’s say that you can’t (or couldn’t be bothered to take the time to) measure the frequency or magnitude response, and you want to figure out the Q based on the time response only, you can calculate this by looking at its impulse response.

For example, Figure 2 shows the initial part of the impulse response of an unknown filter. I’ve highlighted two points that are reasonably close to the tops of two of the cosine wave cycles. I picked the first one (on the left) and then noted its Y value (Y = 0.027). Then I found a top of another wave that was as close to half that value as I could find. You can see there that it’s 2 cycles later, where Y = 0.0149.

So, you multiply the number of cycles it takes to drop by 50% (in this example, 2 cycles) and multiply that by 4.53, which results in a value of about 9. This is a good estimate of the Q of the filter (which is actually 10, if I measure it using the -3 dB points in the magnitude response).

Note that it doesn’t matter which cycle I chose to get the first value, since the rate of decay is the same through the entire time response of the filter. In other words, if I chose the 3rd cycle to do the first measurement, I would have found that the 5th cycle is about 50% lower because it’s also 2 cycles later.

It also doesn’t matter whether we’re talking about peaks or dips, since, as we already know, from a perspective of the individual building blocks of the filter, these are the same thing.

## So what?

Of course, most normal people aren’t measuring the time response of filters to calculate the Q. However, this piece of information is good from the opposite perspective: if you know the Q of the filter, you can figure out how fast it’s decaying. For example, a filter with a Q of 2 will take 2 / 4.53 = 0.44 cycles to decay by 50% (or 6 dB). If you know the frequency, then you can then translate that into a decay rate per seconds, because the period in seconds (the total time of one cycle of the wave) = 1 / Fc.

So, if that filter with a Q of 2 has an Fc of 100 Hz, then the period is 1/100 = 0.01 sec, and therefore it will decay by 6 dB (50%) in 0.44 cycles * 0.01 sec/cycle = 0.0044 sec or 4.4 ms.

If the Fc of the filter is 5 kHz, then the the period is 1/5000 = 0.0002 sec, and therefore it will decay by 5 dB in 0.0002 * 0.44 = 0.000088 sec = 88 µsec. (This is roughly equivalent to 2 samples at 48 kHz.)

Another good thing to remember is that Q = Fc / BW where BW is the bandwidth of the response measured between the two -3 dB points. This means, for example, that if Q = 1, then Fc = BW, therefore the bandwidth is about 1 octave. If Q = 2, then the bandwidth is about 1/2 of an octave, if Q = 12 then the bandwidth is about 1 semitone (1/12th of an octave), and so on.

# Filters and Ringing: Part 3

Now we’ve seen that if we have a filter that results in either a peak or a dip in the magnitude response, we’ll also result in the signal ringing in time. We’ve also seen that the frequency of the ringing is the centre frequency of the filter. Now let’s dig a little deeper into the behaviour of that ringing; or, more specifically its decay characteristics.

We’ll repeat the process from Part 2: measure the impulse response of a peaking filter where Fc = 1 kHz, gain = +12 dB, and Q = 2. However, this time I’ll look at the time response with a different scaling. Instead of plotting the linear value over time, I’ll convert each instantaneous value to dB and plot that. This looks like Figure 1.

The important thing to notice here is that, when I plot the instantaneous amplitude in decibels (in other words, on a logarithmic scale), the decay is a straight line with a slope.

Let’s get two things out of the way here. This isn’t really decibels, because decibels requires some time averaging. Also, I’m actually plotting the absolute value of the impulse response in a decibel scale, because if I try to calculate the log of a negative number, things get ugly. This means that the math I’m actually using to create the bottom plot is

20 * log10(abs(signal))

If I draw a line across the tops of the bumps in that plot, I can look at the decay of the filter’s ringing as in Figure 2.

For this filter, the decay rate of the ringing is -1360 dB per second (which is very fast). Let’s change some parameters and see what happens.

If I increase the gain of the filter without changing the Fc or the Q, I get the following:

I could plot lots more of these so that you start to see a pattern, but I’ll jump to the punch lines and you can use the plots above to check that things make sense.

If I have a filter that is using a definition of Q = Fc / BW (where BW is the distance between the -3 dB points down from the maximum), then:

• Changing the gain does not change the rate of the decay (all least, as long as it’s a boost, according to what we’ve seen so far…)
• Changing the Q will change the slope of the decay inversely proportionally if we’re measuring the slope in dB/sec. For example, if I multiply the Q by 2, the ringing decays twice as slowly. If I multiply the Q by 10, the ringing will take 10 times longer to decay to the same level.
• Changing the frequency will change the slope of the decay proportionally if we’re measuring the slope in dB/sec. For example, if I multiply the frequency by 2, the ringing will decay twice as fast.

Let’s talk about the last of these first, since it’s the easiest to understand conceptually. In the plots above, I’m showing the time in seconds. So, the higher the frequency, the more cycles I’m showing in the same plot. However, if I were plotting time in cycles of the cosine wave instead, the slope would be the same regardless of frequency.

In other words, the level of the ringing decays by the same amount per number of cycles of the cosine wave.

This is why, if you count the number of “bumps” in the dB plots in Figure 2 and 5, you’ll see that they are the same number. It takes about 12 cycles to get down to -100 dB, but the shorter the cycles (because the frequency is higher) the faster you get there when measuring in seconds. If the X-axis were not “Time in milliseconds”, but “Time in periods of the centre frequency” instead, then the slopes would be identical in Figures 2 and 5.

# Filters and Ringing: Part 2

## Rocks, Guitars, and Children

If you throw a rock into a pond on a windless day, you’ll see the ripples moving away in an expanding circle from the place where the rock hit. The ripples are places on the water where the water is either higher or lower than where it was before you hit the rock. The water itself only moves up and down, but the waves expand sideways. (You can see this if there is something floating on the water, for example – it bobs up and down as the waves go by.)

A similar thing happens when you pluck a guitar string. The point where your finger plucked is the same as the point where the rock landed in the water, and waves radiate away from that place on the string in two directions (because there are only two directions to travel in on a string: this way and that way). However, when those waves reach the end of the string, they reflect and come back in the opposite direction.

In both cases, the water and the guitar string, the wave has some speed at which it travels. It’s slow enough on the water for you to watch it, but it’s much too fast on a guitar string. In fact, it’s so fast that, when you pluck it, the wave travels to the end of the string, reflects in the opposite direction, hits the other end of the string, reflects again, and gets back to where you plucked it in about 1/82nd of a second if it’s the low E string. Since the wave doesn’t stop there – it keeps going, repeating the back-and-forth journey along the length of the string every 1/82nd of a second, then we hear a note with a fundamental frequency of 82 Hz (82 cycles per second): a low E.

That ringing that happens on the guitar string will happen no matter how you start the movement on it. You could hit the string with a chopstick, you could just thump the side of the guitar with your fist, you could even stand next to the guitar and cough loudly. All of these things will “inject” energy into the string, causing it to move, and the wave starts banging back and forth.

The rate of repetition is dependent on two things: the length of the string and the speed of the wave. The speed of the wave is dependent on two things: the mass of the string (e.g. how heavy is 1 m of it?) and the tension (how tightly is it stretched?) Increase the tension, and you increase the speed of the wave. Decrease the mass and you increase the speed of the wave. Increase the speed of the wave, and the repetition takes less time, so you hear a higher note.

That frequency at which the string will naturally ring is called a resonance. A child on a swing will go back and forth at the same rate (number of times per second) no matter how gently or forcefully you push them – apply energy, and the system will resonate.

Now, let’s think about that push of the child, the rock hitting the water, or the pluck of the guitar string. All of those things are a short injection of energy: a kind of impulse, and the way the child, the water, or the string behaves afterwards is its impulse response – how it responds to that impulse.

But here’s a strange thing to consider. This means that the note (the frequency) that you hear from the guitar string was one of the many frequencies in the initial pluck itself.

So, another way to think of this is that, by plucking the string, you inject a signal with all frequencies in it, and all of those frequencies decay (“die away”) very quickly except for one.

Okay, okay, if we’re going to be pedantic, I should be including not only the fundamental frequency but all of the additional harmonics; typically multiples of that frequency. But we don’t need to complicate things with the truth at the moment…

## What does this have to do with filters?

From a “big picture” point of view, a guitar string is a filter. I feed in some signal (the pluck) and I get out a modified version of that signal (the note ringing). From the same perspective, a filter in an equaliser is the same: I feed in a signal (music) and I get out a modified version of it (the same music, but slightly louder at 1 kHz, for example). What’s interesting is that the two things basically work the same way.

Let’s take the example of the filter at the end of Part 1: a peaking filter with a boost of 12 dB at 1 kHz, with a Q of 2. If I feed in a sine wave (which only contains energy at 1 frequency) at a very low frequency (say, 100 Hz or lower) then the level of the output will equal that of the input. If I do the same with a very high frequency (say, 10 kHz) then the level of the output will also equal that of the input. However, if I feed in a sine wave at 1 kHz, the output will be 4 times louder than the input (+12 dB = 4 time the amplitude because 20*log10(4) = 12-ish).

At some other frequency around 1 kHz, I’ll get a different answer. However, this is a VERY long and tedious way to measure the magnitude response of the filter. Another option is to measure its impulse response.

If I feed the input of the filter with an impulse (which is a sound that contains all frequencies at the same level, as we saw in Part 1), and look at the filters output in time, it might look like this:

Notice that the impulse looks like an impulse at Time = 0, but then something extra happens afterwards – like a guitar string ringing in time. If I zoom in vertically and look at the same plot, it will look like Figure 3.

And if we zoom in horizontally as well, it will look like this.

So, as you can see there, it’s almost as if we kept the impulse, and then just added a cosine wave with a period (a repetition time) of 1 ms, starting at Time = 0 and decaying over time. In fact, that’s exactly what the filter does.

## Time response to Frequency response

The excuse I gave above for sending an impulse through the filter (instead of sine waves) was that this will be a faster way to measure its response. The time response of the filter is already done. We can see that in the figures above. But how do we see the filter’s frequency response? This is done using a clever bit of math called a Fourier Transform, which lets you take a signal in time, and analyse its content by frequency. I won’t explain that here, but if you’re interested in how it works, you can start by reading this.

If I take the total impulse response (also known as a time response measurement) of the filter: in other words, I send in an impulse, I record the output and don’t stop recording until the ringing has decayed to a level low enough that I no longer care (for the purposes of this discussion, at least). Then, I do a Fourier Transform of the recording, I get something like Figure 5.

There is no new information in Figure 5. It’s just a setup for Figures 6 and 7.

Let’s now start slicing up the time response selectively to see what frequencies are contained in the output of the filter at what time. We’ll start by just taking the first and second samples of the impulse at the output, shown in Figure 6.

As you can see in Figure 6, if I remove the ringing that comes after the impulse, then the response of the signal has an almost-flat magnitude response and a gain of about 2 dB or so. This should not come as a surprise, since it’s almost an impulse. The only real difference between the portion that I’ve used and a real impulse is that the second value is not 0. So far so good…

Let’s look at the remainder of the time response. This is shown in Figure 7.

Figure 7 shows something interesting. We see the response of a band-pass filter with a centre frequency of 1 kHz, and a gain of 9 dB, which is the response of the filter after the initial impulse has passed.

## What does this all mean!?

If we leave out one important thing for now, this means that a peaking filter that has a boost of 12 dB, an Fc of 1 kHz and a Q of 2 is actually the sum of two things:

• a through-put with a little gain (about 1 dB)
• a bandpass filter with a gain of about 9 dB

This is, in essence, true. You can create a peaking filter by summing a bandpass filter to a through-put. However, an important point to realise here is that the band pass signal essentially comes after the onset of the signal. In Part 3, we’ll talk about whether this is a problem – or, more accurately, when this might be a problem. For now, however, I’ll throw one more example at you.

Up to now, we’ve only looked at the example of a peaking filter with a boost. What happens when the filter has a cut instead?

Notice that a dip filter also rings in time after the initial impulse, but decays much faster than the equivalent boost. (I’ll have to be a bit more careful about my use of the word “equivalent”, actually – but I’ll straighten that out at the end of the series. To be continued…)

Okay, what’s going on here? A peaking filter with a boost is a through-put plus a bandpass. A dip filter is ALSO a through-put plus a somewhat quieter (sort-of) bandpass. This doesn’t make any sense.

Actually it doesn’t make any sense because there’s a piece of information that I’m leaving out – the phase of the ringing. Notice that, with the peaking filter, the decay portion starts positive and then goes negative initially. With the dip filter, the decay starts negative and goes positive. So, the previous paragraph should have read: “A peaking filter with a boost is a through-put PLUS a bandpass. A dip filter is ALSO a through-put MINUS a somewhat quieter (sort-of) bandpass.”

The phases of the decays of the bandpass portions are opposite for the two filters. Another way to think of this is that the ringing in the dip filter cancels the energy around 1 kHz in the initial impulse, whereas the ringing in the peak filter adds to it.

However, it’s really important to note for now that both filters – the peak and the dip result in ringing in time.

# Filters and Ringing: Part 1

Let’s say that, for some reason, you want to apply an equaliser to an audio signal. It doesn’t matter why you want to do this: maybe you like more bass, maybe you need more treble, maybe you’re trying to reduce the audibility of a room mode. However, one thing that you should know is that, by changing the frequency response of the system, you are also changing its time response.

Now, before we go any farther, do NOT mis-interpret that last sentence to mean that a change in the time response is a bad thing. Maybe the thing you’re trying to fix already has an issue with its time response, and sometimes you have to fight fire with fire.

Before we start talking about filters, let’s talk about what “time response” means. I often work in an especially-built listening room that has acoustical treatments that are specifically designed and implemented to result in a very controlled acoustical behaviour. I often have visitors in there, and one of the things they do to “test the acoustics” is to clap their hands once – and then listen.

On the one hand (ha ha) this is a strange thing to do, because the room is not designed to make the sound of a single hand clap performed at the listening position sound “good” (whatever that means). On the other hand, the test is not completely useless. It’s a “play-toy” version of a very useful test we use to measure a loudspeaker called an impulse response measurement. The clap is an impulsive sound (a short, loud sound) and the question is “how does the thing you’re measuring (a room or a loudspeaker, for example) respond to that impulse?”

So, let’s start by talking about the two important reasons why we use an impulse.

## Time response

If a thing in a room makes a sound, then the sound radiates in all directions and starts meeting objects in its path – things like walls and furniture and you. When that happens, the surface it meets will absorb some amount of energy and reflect the rest, and this is balance of absorbed-to-reflected energy is different at different frequencies. A cat will absorb high frequencies and low frequencies will just pass by it. A large flat wall made of gypsum will reflect high frequencies and absorb whatever frequency it “wants” to vibrate at when you thump it with your fist.

The energy that is absorbed is (eventually) converted to heat: that’s lost. The reflected energy comes back into the room and heads towards another surface – which might be you as well, but probably isn’t unless you’re in a room about the size of an ancient structure known to archeologists as a “phone booth”.

At your location, you only hear the sound that reaches you. The first part of the sound that you hear “immediately” after the thing made the noise, probably travelled a path directly from the source to you. Let’s say that you’re in a large church or an aircraft hangar – the last sound that you hear as it decays to nothing might be 5 seconds (or more!) after the thing made the noise, which means that the sound travelled a total of 5 sec * 344 m/s = 1.72 km bouncing around the church before finally arriving at your position.

So, if I put a loudspeaker that radiates simultaneously in all directions equally at all frequencies (audio geeks call this a point source) somewhere in a room, and I put a microphone that is equally sensitive to all frequencies from all directions (audio geeks call this an omnidirectional microphone) and I send an impulse (a “click”) out of the loudspeaker and record the output of the microphone, I’ll see something like this:

Some things to notice about that plot shown above

• There is some silence before the first sound starts. This is the time it takes for the sound to get from the loudspeaker to the microphone (travelling at about 344 m/s, and with an onset of about 30 ms, this means that the microphone was about 10.3 m away.
• There are some significant spikes in the signal after the first one. These are nice, clean reflections off some surfaces like walls, the floor or the ceiling.
• Mostly, this is a big mess, so it’s difficult to point somewhere else and say something like “that is the reflection off the coffee mug on the table over there, after the sound has already hit the ceiling and two walls on the way” for example…

So, this shows us something about how the room responds to an impulse over time. The nice (theoretical) thing is that this is a plot of what will happen to everything that comes out of the loudspeaker, over time, when captured at the microphone’s position. In other words, if you know the instantaneous sound pressure at any given moment at the output of the point-source loudspeaker, then you can go through time, multiplying that value by each value, moment by moment, in that plot to predict what will come out of the microphone. But this means that the total output of the microphone is all of the sound that came out of the loudspeaker over the 1000 ms plotted there, with each moment individually multiplied by each point on the plot – and all added together.

This may sound complicated, but think of it as a more simple example: When you’re sitting and listening to someone speak in a church, you can hear what that person just said, in addition to the reverberation (reflections) of what they said seconds ago. There is one theory that this is how harmony was invented: choirs in churches noticed that the reverb from the previous note blended nicely with the current note, and so chords were born.

## Frequency Response

There is a second really good reason for using an impulse to test a system. An impulse (in theory) contains all frequencies at the same level. This is a little difficult to wrap ones head around (at least, it took me years to figure out why…) but let me try to explain.

Any sound is the combination of some number of different frequencies, each with some level and some time relationship. This means that, I can start with the “ingredients” and add them together to make the sound I want. If I start with two frequencies: 1 Hz and 2 Hz and add them together, using cosine waves (a cosine wave is the same as a sine wave that starts 90º late), the result is as shown in Figure 2.

Let’s do this again, but increase the number to 5 frequencies: 1 Hz, 2 Hz, 3 Hz, 4 Hz, and 5 Hz.

You may notice that the peak at Time = 0 ms is getting bigger relative to the rest of the result. However, we get the same peak values at Time = -1000 ms and Time = 1000 ms. This is because the frequencies I’m choosing are integer values: 1 Hz, 2 Hz, 3 Hz, and so on. What happens if we use frequencies in between? Say, 0.1 Hz to 10 Hz in steps of 0.1 Hz, thus making 100 cosine waves added together? Now they won’t line up nicely every second, so the result looks like Figure 4.

Let’s get crazy. Figure 5 shows 10,000 cosine waves with frequencies of 0 to 100 Hz in steps of 0.01 Hz.

You may start to notice that the result of adding more and more cosine waves together at different frequencies is starting to look a lot like an impulse. It’s really loud at Time = 0 ms (whenever that is, but typically we think that it’s “now”) and it’s really quiet forever, both in the past and the future.

So, the moral of the story here is that if you click your fingers and make a “perfect” impulse, one philosophical way to think of this is that, at the beginning of time, cosine waves, all of them at different frequencies, started sounding – all of them cancelling each other until that moment when you decided to snap your fingers at Time = 0. Then they all continue until the end of time, cancelling each other out forever…

Or, another way to think of it is simply to say “an impulse contains all frequencies, each with the same amplitude”.

One small point: you may have noticed in Figure 5 that the impulse is getting big. That one added up to 10,001 – and we were just getting started. Theoretically, a real impulse is infinitely short and infinitely loud. However, you don’t want to make that sound because an infinitely loud sound will explode the universe, and that will wreck your analysis… It will at least clip your input.

## Equalisation

Let’s take a simple example of an equaliser. I’ll use an EQ to apply a boost of 12 dB with a centre frequency of 1 kHz and a Q of 2. (Note that “Q” has different definitions. The one I’ll be using here is where the Q = Fc / BW, where BW is the bandwidth in Hz between the -3 dB points relative to the highest magnitude. If you want to dig deeper into this topic, you can start here.) That filter will have a magnitude response that looks like this:

As you can see there, this means that a signal coming into that filter at 20 Hz or 20 kHz will come out at almost exactly the same level. At 1000 Hz, you’ll get 12 dB more at the output than the input. Other frequencies will have other results.

The question is: “how does the filter do that, conceptually speaking?”

That’s what we’ll look at in the next part of this series.