Now we’ve seen that if we have a filter that results in either a peak or a dip in the magnitude response, we’ll also result in the signal ringing in time. We’ve also seen that the frequency of the ringing is the centre frequency of the filter. Now let’s dig a little deeper into the behaviour of that ringing; or, more specifically its decay characteristics.
We’ll repeat the process from Part 2: measure the impulse response of a peaking filter where Fc = 1 kHz, gain = +12 dB, and Q = 2. However, this time I’ll look at the time response with a different scaling. Instead of plotting the linear value over time, I’ll convert each instantaneous value to dB and plot that. This looks like Figure 1.
The important thing to notice here is that, when I plot the instantaneous amplitude in decibels (in other words, on a logarithmic scale), the decay is a straight line with a slope.
Let’s get two things out of the way here. This isn’t really decibels, because decibels requires some time averaging. Also, I’m actually plotting the absolute value of the impulse response in a decibel scale, because if I try to calculate the log of a negative number, things get ugly. This means that the math I’m actually using to create the bottom plot is
20 * log10(abs(signal))
If I draw a line across the tops of the bumps in that plot, I can look at the decay of the filter’s ringing as in Figure 2.
For this filter, the decay rate of the ringing is -1360 dB per second (which is very fast). Let’s change some parameters and see what happens.
If I increase the gain of the filter without changing the Fc or the Q, I get the following:
I could plot lots more of these so that you start to see a pattern, but I’ll jump to the punch lines and you can use the plots above to check that things make sense.
If I have a filter that is using a definition of Q = Fc / BW (where BW is the distance between the -3 dB points down from the maximum), then:
Changing the gain does not change the rate of the decay.
Changing the Q will change the slope of the decay inversely proportionally (if we’re measuring the slope in dB/sec). For example, if I multiply the Q by 2, the ringing decays twice as slowly. If I multiply the Q by 10, the ringing will take 10 times longer to decay to the same level.
Changing the frequency will change the slope of the decay proportionally (if we’re measuring the slope in dB/sec). For example, if I multiply the frequency by 2, the ringing will decay twice as fast.
Let’s talk about the last of these first, since it’s the easiest to understand conceptually. In the plots above, I’m showing the time in seconds. So, the higher the frequency, the more cycles I’m showing in the same plot. However, if I were plotting time in cycles of the cosine wave instead, the slope would be the same regardless of frequency.
In other words, the level of the ringing decays by the same amount per number of cycles of the cosine wave.
This is why, if you count the number of “bumps” in the dB plots in Figure 2 and 5, you’ll see that they are the same number. It takes about 12 cycles to get down to -100 dB, but the shorter the cycles (because the frequency is higher) the faster you get there when measuring in seconds.
If you throw a rock into a pond on a windless day, you’ll see the ripples moving away in an expanding circle from the place where the rock hit. The ripples are places on the water where the water is either higher or lower than where it was before you hit the rock. The water itself only moves up and down, but the waves expand sideways. (You can see this if there is something floating on the water, for example – it bobs up and down as the waves go by.)
A similar thing happens when you pluck a guitar string. The point where your finger plucked is the same as the point where the rock landed in the water, and waves radiate away from that place on the string in two directions (because there are only two directions to travel in on a string: this way and that way). However, when those waves reach the end of the string, they reflect and come back in the opposite direction.
In both cases, the water and the guitar string, the wave has some speed at which it travels. It’s slow enough on the water for you to watch it, but it’s much too fast on a guitar string. In fact, it’s so fast that, when you pluck it, the wave travels to the end of the string, reflects in the opposite direction, hits the other end of the string, reflects again, and gets back to where you plucked it in about 1/82nd of a second if it’s the low E string. Since the wave doesn’t stop there – it keeps going, repeating the back-and-forth journey along the length of the string every 1/82nd of a second, then we hear a note with a fundamental frequency of 82 Hz (82 cycles per second): a low E.
That ringing that happens on the guitar string will happen no matter how you start the movement on it. You could hit the string with a chopstick, you could just thump the side of the guitar with your fist, you could even stand next to the guitar and cough loudly. All of these things will “inject” energy into the string, causing it to move, and the wave starts banging back and forth.
The rate of repetition is dependent on two things: the length of the string and the speed of the wave. The speed of the wave is dependent on two things: the mass of the string (e.g. how heavy is 1 m of it?) and the tension (how tightly is it stretched?) Increase the tension, and you increase the speed of the wave. Decrease the mass and you increase the speed of the wave. Increase the speed of the wave, and the repetition takes less time, so you hear a higher note.
That frequency at which the string will naturally ring is called a resonance. A child on a swing will go back and forth at the same rate (number of times per second) no matter how gently or forcefully you push them – apply energy, and the system will resonate.
Now, let’s think about that push of the child, the rock hitting the water, or the pluck of the guitar string. All of those things are a short injection of energy: a kind of impulse, and the way the child, the water, or the string behaves afterwards is its impulse response – how it responds to that impulse.
But here’s a strange thing to consider. This means that the note (the frequency) that you hear from the guitar string was one of the many frequencies in the initial pluck itself.
So, another way to think of this is that, by plucking the string, you inject a signal with all frequencies in it, and all of those frequencies decay (“die away”) very quickly except for one.
Okay, okay, if we’re going to be pedantic, I should be including not only the fundamental frequency but all of the additional harmonics; typically multiples of that frequency. But we don’t need to complicate things with the truth at the moment…
What does this have to do with filters?
From a “big picture” point of view, a guitar string is a filter. I feed in some signal (the pluck) and I get out a modified version of that signal (the note ringing). From the same perspective, a filter in an equaliser is the same: I feed in a signal (music) and I get out a modified version of it (the same music, but slightly louder at 1 kHz, for example). What’s interesting is that the two things basically work the same way.
Let’s take the example of the filter at the end of Part 1: a peaking filter with a boost of 12 dB at 1 kHz, with a Q of 2. If I feed in a sine wave (which only contains energy at 1 frequency) at a very low frequency (say, 100 Hz or lower) then the level of the output will equal that of the input. If I do the same with a very high frequency (say, 10 kHz) then the level of the output will also equal that of the input. However, if I feed in a sine wave at 1 kHz, the output will be 4 times louder than the input (+12 dB = 4 time the amplitude because 20*log10(4) = 12-ish).
At some other frequency around 1 kHz, I’ll get a different answer. However, this is a VERY long and tedious way to measure the magnitude response of the filter. Another option is to measure its impulse response.
If I feed the input of the filter with an impulse (which is a sound that contains all frequencies at the same level, as we saw in Part 1), and look at the filters output in time, it might look like this:
Notice that the impulse looks like an impulse at Time = 0, but then something extra happens afterwards – like a guitar string ringing in time. If I zoom in vertically and look at the same plot, it will look like Figure 3.
And if we zoom in horizontally as well, it will look like this.
So, as you can see there, it’s almost as if we kept the impulse, and then just added a cosine wave with a period (a repetition time) of 1 ms, starting at Time = 0 and decaying over time. In fact, that’s exactly what the filter does.
Time response to Frequency response
The excuse I gave above for sending an impulse through the filter (instead of sine waves) was that this will be a faster way to measure its response. The time response of the filter is already done. We can see that in the figures above. But how do we see the filter’s frequency response? This is done using a clever bit of math called a Fourier Transform, which lets you take a signal in time, and analyse its content by frequency. I won’t explain that here, but if you’re interested in how it works, you can start by reading this.
If I take the total impulse response (also known as a time response measurement) of the filter: in other words, I send in an impulse, I record the output and don’t stop recording until the ringing has decayed to a level low enough that I no longer care (for the purposes of this discussion, at least). Then, I do a Fourier Transform of the recording, I get something like Figure 5.
There is no new information in Figure 5. It’s just a setup for Figures 6 and 7.
Let’s now start slicing up the time response selectively to see what frequencies are contained in the output of the filter at what time. We’ll start by just taking the first and second samples of the impulse at the output, shown in Figure 6.
As you can see in Figure 6, if I remove the ringing that comes after the impulse, then the response of the signal has an almost-flat magnitude response and a gain of about 2 dB or so. This should not come as a surprise, since it’s almost an impulse. The only real difference between the portion that I’ve used and a real impulse is that the second value is not 0. So far so good…
Let’s look at the remainder of the time response. This is shown in Figure 7.
Figure 7 shows something interesting. We see the response of a band-pass filter with a centre frequency of 1 kHz, and a gain of 9 dB, which is the response of the filter after the initial impulse has passed.
What does this all mean!?
If we leave out one important thing for now, this means that a peaking filter that has a boost of 12 dB, an Fc of 1 kHz and a Q of 2 is actually the sum of two things:
a through-put with a little gain (about 1 dB)
a bandpass filter with a gain of about 9 dB
This is, in essence, true. You can create a peaking filter by summing a bandpass filter to a through-put. However, an important point to realise here is that the band pass signal essentially comes after the onset of the signal. In Part 3, we’ll talk about whether this is a problem – or, more accurately, when this might be a problem. For now, however, I’ll throw one more example at you.
Up to now, we’ve only looked at the example of a peaking filter with a boost. What happens when the filter has a cut instead?
Notice that a dip filter also rings in time after the initial impulse, but decays much faster than the equivalent boost. (I’ll have to be a bit more careful about my use of the word “equivalent”, actually – but I’ll straighten that out at the end of the series. To be continued…)
Okay, what’s going on here? A peaking filter with a boost is a through-put + a bandpass. A dip filter is ALSO a through-put + a somewhat quieter (sort-of) bandpass. This doesn’t make any sense.
Actually it doesn’t make any sense because there’s a piece of information that I’m leaving out – the phase of the ringing. Notice that, with the peaking filter, the decay portion starts positive and then goes negative initially. With the dip filter, the decay starts negative and goes positive.
So, the phases of the decays of the bandpass portions are opposite for the two filters. Another way to think of this is that the ringing in the dip filter cancels the energy around 1 kHz in the initial impulse, whereas the ringing in the peak filter adds to it.
However, it’s really important to note for now that both filters – the peak and the dip result in ringing in time.
Let’s say that, for some reason, you want to apply an equaliser to an audio signal. It doesn’t matter why you want to do this: maybe you like more bass, maybe you need more treble, maybe you’re trying to reduce the audibility of a room mode. However, one thing that you should know is that, by changing the frequency response of the system, you are also changing its time response.
Now, before we go any farther, do NOT mis-interpret that last sentence to mean that a change in the time response is a bad thing. Maybe the thing you’re trying to fix already has an issue with its time response, and sometimes you have to fight fire with fire.
Before we start talking about filters, let’s talk about what “time response” means. I often work in an especially-built listening room that has acoustical treatments that are specifically designed and implemented to result in a very controlled acoustical behaviour. I often have visitors in there, and one of the things they do to “test the acoustics” is to clap their hands once – and then listen.
On the one hand (ha ha) this is a strange thing to do, because the room is not designed to make the sound of a single hand clap performed at the listening position sound “good” (whatever that means). On the other hand, the test is not completely useless. It’s a “play-toy” version of a very useful test we use to measure a loudspeaker called an impulse response measurement. The clap is an impulsive sound (a short, loud sound) and the question is “how does the thing you’re measuring (a room or a loudspeaker, for example) respond to that impulse?”
So, let’s start by talking about the two important reasons why we use an impulse.
If a thing in a room makes a sound, then the sound radiates in all directions and starts meeting objects in its path – things like walls and furniture and you. When that happens, the surface it meets will absorb some amount of energy and reflect the rest, and this is balance of absorbed-to-reflected energy is different at different frequencies. A cat will absorb high frequencies and low frequencies will just pass by it. A large flat wall made of gypsum will reflect high frequencies and absorb whatever frequency it “wants” to vibrate at when you thump it with your fist.
The energy that is absorbed is (eventually) converted to heat: that’s lost. The reflected energy comes back into the room and heads towards another surface – which might be you as well, but probably isn’t unless you’re in a room about the size of an ancient structure known to archeologists as a “phone booth”.
At your location, you only hear the sound that reaches you. The first part of the sound that you hear “immediately” after the thing made the noise, probably travelled a path directly from the source to you. Let’s say that you’re in a large church or an aircraft hangar – the last sound that you hear as it decays to nothing might be 5 seconds (or more!) after the thing made the noise, which means that the sound travelled a total of 5 sec * 344 m/s = 1.72 km bouncing around the church before finally arriving at your position.
So, if I put a loudspeaker that radiates simultaneously in all directions equally at all frequencies (audio geeks call this a point source) somewhere in a room, and I put a microphone that is equally sensitive to all frequencies from all directions (audio geeks call this an omnidirectional microphone) and I send an impulse (a “click”) out of the loudspeaker and record the output of the microphone, I’ll see something like this:
Some things to notice about that plot shown above
There is some silence before the first sound starts. This is the time it takes for the sound to get from the loudspeaker to the microphone (travelling at about 344 m/s, and with an onset of about 30 ms, this means that the microphone was about 10.3 m away.
There are some significant spikes in the signal after the first one. These are nice, clean reflections off some surfaces like walls, the floor or the ceiling.
Mostly, this is a big mess, so it’s difficult to point somewhere else and say something like “that is the reflection off the coffee mug on the table over there, after the sound has already hit the ceiling and two walls on the way” for example…
So, this shows us something about how the room responds to an impulse over time. The nice (theoretical) thing is that this is a plot of what will happen to everything that comes out of the loudspeaker, over time, when captured at the microphone’s position. In other words, if you know the instantaneous sound pressure at any given moment at the output of the point-source loudspeaker, then you can go through time, multiplying that value by each value, moment by moment, in that plot to predict what will come out of the microphone. But this means that the total output of the microphone is all of the sound that came out of the loudspeaker over the 1000 ms plotted there, with each moment individually multiplied by each point on the plot – and all added together.
This may sound complicated, but think of it as a more simple example: When you’re sitting and listening to someone speak in a church, you can hear what that person just said, in addition to the reverberation (reflections) of what they said seconds ago. There is one theory that this is how harmony was invented: choirs in churches noticed that the reverb from the previous note blended nicely with the current note, and so chords were born.
There is a second really good reason for using an impulse to test a system. An impulse (in theory) contains all frequencies at the same level. This is a little difficult to wrap ones head around (at least, it took me years to figure out why…) but let me try to explain.
Any sound is the combination of some number of different frequencies, each with some level and some time relationship. This means that, I can start with the “ingredients” and add them together to make the sound I want. If I start with two frequencies: 1 Hz and 2 Hz and add them together, using cosine waves (a cosine wave is the same as a sine wave that starts 90º late), the result is as shown in Figure 2.
Let’s do this again, but increase the number to 5 frequencies: 1 Hz, 2 Hz, 3 Hz, 4 Hz, and 5 Hz.
You may notice that the peak at Time = 0 ms is getting bigger relative to the rest of the result. However, we get the same peak values at Time = -1000 ms and Time = 1000 ms. This is because the frequencies I’m choosing are integer values: 1 Hz, 2 Hz, 3 Hz, and so on. What happens if we use frequencies in between? Say, 0.1 Hz to 10 Hz in steps of 0.1 Hz, thus making 100 cosine waves added together? Now they won’t line up nicely every second, so the result looks like Figure 4.
Let’s get crazy. Figure 5 shows 10,000 cosine waves with frequencies of 0 to 100 Hz in steps of 0.01 Hz.
You may start to notice that the result of adding more and more cosine waves together at different frequencies is starting to look a lot like an impulse. It’s really loud at Time = 0 ms (whenever that is, but typically we think that it’s “now”) and it’s really quiet forever, both in the past and the future.
So, the moral of the story here is that if you click your fingers and make a “perfect” impulse, one philosophical way to think of this is that, at the beginning of time, cosine waves, all of them at different frequencies, started sounding – all of them cancelling each other until that moment when you decided to snap your fingers at Time = 0. Then they all continue until the end of time, cancelling each other out forever…
Or, another way to think of it is simply to say “an impulse contains all frequencies, each with the same amplitude”.
One small point: you may have noticed in Figure 5 that the impulse is getting big. That one added up to 10,001 – and we were just getting started. Theoretically, a real impulse is infinitely short and infinitely loud. However, you don’t want to make that sound because an infinitely loud sound will explode the universe, and that will wreck your analysis… It will at least clip your input.
Let’s take a simple example of an equaliser. I’ll use an EQ to apply a boost of 12 dB with a centre frequency of 1 kHz and a Q of 2. (Note that “Q” has different definitions. The one I’ll be using here is where the Q = Fc / BW, where BW is the bandwidth in Hz between the -3 dB points relative to the highest magnitude. If you want to dig deeper into this topic, you can start here.) That filter will have a magnitude response that looks like this:
As you can see there, this means that a signal coming into that filter at 20 Hz or 20 kHz will come out at almost exactly the same level. At 1000 Hz, you’ll get 12 dB more at the output than the input. Other frequencies will have other results.
The question is: “how does the filter do that, conceptually speaking?”
That’s what we’ll look at in the next part of this series.
Back in this posting, I talked about biquads and their use in digital signal processing for making linear filters (what most of us call “equalisers”). Part of that explanation showed that a biquad is formed of a feed-forward section and a feed-back section, and you could swap the order of these two and get the same results. In a “Direct Form 1” implementation, the feed-forward comes first, as shown in Figure 1.
in the “Direct Form 2” implementation, the feed-back comes first, as shown in Figure 2.
There are advantages and disadvantages to each of these implementations, depending on things like how the rest of your system is implemented, and what, exactly you’re expecting the biquad to do.
However, for this posting, we’re going to “zoom in” a bit to the feed-back portion of the above diagrams. This portion of the biquad provides the “poles” in the Z-plane, as I described in the “Intuitive Z-plane” series of postings last week.
If I separate the feed-back portion of the above figures, it would look like Figure 3:
The locations of the poles (and therefore the magnitude response) of this portion of the filter are dependent on the gains at the outputs of the two 1-sample delays. What happens when these gains do not have an infinite resolution (which they can’t, because everything in a digitally-represented world is quantised to a finite number of steps)?
Before we go any further with this, I have to put in a reminder that quantisation (or “rounding”) in a DSP world is done in binary – not decimal. So, the quantisation that I’m about to do isn’t the same as stopping a couple of digits after the decimal…
In a world with infinite resolution, I can set the values of those two gains to be anything I want, and therefore the poles in my system can be anywhere within the circle. (We’ll assume that we don’t want a pole on the circle because that would result in a gain of ∞ dB, which is very loud.) However, let’s say that we quantise the gain values to a 4-bit binary value, where the first bit is reserved for the +/- indicator. This means that we only have +/- 2^3 possibilities, or 8 values, one of which is +0, and another is -0 (which is the same thing…). So, in other words, we have 7 possible negative values, 7 possible positive values, and 0.
The end result of this is that the poles have a limited number of locations where they can be placed. For the 4-bit quantisation described above, the resulting locations look like Figure 4.
Remember that we’re not talking about quantisation of the audio signal – it’s quantisation of the gain coefficients inside the biquad that will have an impact on the response of the filter.
Of course, it would be crazy to implement a biquad using only 4-bit quantisation for the gain coefficients. However, the point of this posting is not to show that biquads suck. It’s only to show one possibly important aspect of them if you’re a DSP engineer – but we’re getting there.
Just for fun, let’s increase the resolution of the system to 6 bits:
… or to 8 bits:
Any higher than this and the pattern will just get so dense that it’ll turn black, so I’ll stop.
At this point, you may be asking “so what?” The answer to this very important question lies on the far right side of all of those graphs. Notice that there aren’t any locations near the point on the graph where X=1 and Y=0. You may remember from Figure 7 in this posting that I did last week, that that’s the point on the Z-plane that corresponds to very low frequencies.
This means that, if you want to create a digital filter, and you need a pole near 0 Hz, you’re going to run into some trouble with a Direct Form implementation of the biquad. Yes, the higher the bit depth of the gain coefficients, the closer you can get, but this might not be the best way to do things.
There is another option for implementing your feed-back portion of the biquad. You can use a design called a Coupled Form instead, which is shown in Figure 7 (compare it to Figure 3).
Notice that you still have two 1-sample delays, however there are now 4 gains instead of 2. How can this be better? Well, in a system with infinite resolution on the gain coefficients, it’s not. Given the appropriate choice of gain values, this implementation will do exactly the same thing as the Direct Form implementation if your resolution is infinite.
However, if you have a limited resolution, then the available locations of the poles on the Z-plane are very different. Let’s use the 4-bit, 6-bit, and 8-bit quantisations of the gain values again: these are shown in Figures 8 to 10.
As you can see in Figures 8 through 10, the Coupled Form implementation, given the same resolution on the gain coefficients, will give you much better placement opportunities for the poles in the low frequency region than the Direct Form implementation.
Of course, in all of these examples, I’m only showing up to an 8-bit word, and a typical DSP runs uses a lot more than 8 bits for the gain coefficients. So, it’s possible that, in the real world, the actual resolution is high enough that this is of no concern whatsoever.
However, if you’re building a very-low-frequency filter an if the magnitude response isn’t exactly what you’re looking for, this might help you get a little closer to your goal.
It’s important to point out here that the quantisation of the locations of the zeros is not the same as this. Someday, I’ll come back to this and plot the expected vs. actual magnitude responses of some biquads where I’ve quantised the zeros and poles, just to see how badly things go wrong when they do…
This posting is a simple summary of the discussion of a section called “Poles of Quantized Second-Order Sections” in “Discrete-Time Signal Processing” by Alan V. Oppenheim and Ronald W. Schafer.
The Coupled Form implementation was introduced by C.M. Rader and B. Gold in their paper called “Digital Filter Design Techniques in the Frequency Domain” from the Proceesings of the IEEE, Vol. 55, pp. 149 – 171 , from February, 1967.
If you’ve read through the first four parts of this series, then you’re already at a point where you can intuitively understand what’s going on. We just have a couple of details to take care of before finishing off.
Firstly, the plots showing the zeros and poles in the figures you’ve been looking at plots of the “Z-plane” or “Complex-plane“. As I said at the start, we’re only trying to get to an intuitive understanding of these plots – so I’m not going to get into complex numbers, or even much math (apart from what you’ll see below… which isn’t very complicated, and avoids complex numbers).
When I’m developing a new DSP algorithm, I use an application called Max from cycling74.com. Figure 1 shows a screenshot from Max, where I’m using an object to calculate the biquad coefficients to make a low pass filter, as you can see. I’ve then connected the output of that object (it looks like a magnitude response) to a Z-plan representation that shows me the same thing in a different way.
You may notice that this plot has two poles, one at (0, 0.408) and the other at (0, -0.408). In fact there are two zeros there as well, but they’re situated in the same place, on “on top” of the other, at (-1, 0). This is always true for a biquad – there are always two zeros and two poles. Sometimes, they’re located in the same place, sometimes not, sometimes they’re placed symmetrically, sometimes not, depending on the filter, as we’ll see below.
Let’s look at that Z-plane representation in 3-dimensions:
So, as you would now expect, the poles pull up the edge of the circle, and the zeros (both in the same place) pull down, giving the red line the height that it has.
Now, think back to this Figure from earlier in the series:
If you therefore look at Figure 3, which is like looking at Figure 4 from the top, you’ll notice that the height of the red line (the edge of the circle is high on the left (in the low frequencies) and drops as you go to the right (the high frequencies). This is the magnitude response that’s shown on the top of Figure 1. The only difference is that it’s on a linear scale instead of a logarithmic scale, so the shape looks a little weird.
Let’s do another one:
Hopefully, now you are able to look at a Z-plane representation of a filter and think about the effect of the poles and zeros on the edge of the circle, and therefore get a rough idea of the magnitude response of the filter…
If not, I apologize for wasting your time. On the other hand, if you’re in a life-threatening situation, this knowledge probably wouldn’t help you anyway… Very few people have gotten a critical injury in a biquad accident.
How I did it
If you want to make these plots for yourself, the math is pretty simple.
Start by choosing the frequency, which will be a point on the circle. You then find the four distances from the zeros and poles to that point (I’ve indicated those distances in Figure 8 with the variables z1, z2, p1, and p2.) This can be done using the Pythagorean theorem.
To find the gain of the filter at the frequency, you divide the sum of the zeros’ distances by the sum of the poles’ distances. In other words:
(z1 + z2) / (p1 + p2)
That will give you the result as a linear value. If you then want to convert it to decibels, like I’ve done, you do a little extra math like this:
20 * log10 ( (z1 + z2) / (p1 + p2) )
That’s it! You just need to do repeat that math for each frequency that you’re interested in, and you’re done!
I ended Part 3 by saying that DSP engineers think of the frequency scale as a circle rather than as a straight line. The questions are “Why do they think like this? What’s wrong with them?” Although I can’t answer the second question, the answer to the first question is fundamentally simple.
A DSP engineer is not really interested in what a filter does. She or he is interested in how it filters. A normal magnitude response plot shows us mortals the result of what’s happened to the audio after it’s gone through a filter (or a processor in general). Someone making that filter (or system) needs to know how it’s working instead.
So far, in this series, we’ve seen the following:
Digital audio filters are made with feed-forward and feed-back delays with different gains.
Feed-forward delays make narrow dips (and wide peaks) in the magnitude response
Feed-back delays make narrow peaks (and wide dips) in the magnitude response
DSP engineers think of frequency on a circle instead of a straight line
DSP engineers also want to see plots of how a filter works instead of its result on the audio signal.
Let’s put all of this together.
We’ll draw the circle showing the frequency scale, but then rotate the view to see it in three dimensions. For example, I can pretend to make the surface of of the circle out of a rubber sheet that can be pulled upwards (like a tent) or downwards (like a funnel), whilst always maintaining a circular edge.
If I want to pull the tent upwards, I’ll use a “pole” to do it. That pole has an infinite height (we’re going to need some very stretchy rubber). If I want to pull the funnel downwards, I’ll use something I’ll call a “zero“. (I am not going to go into why zeros and poles are called that, so as to avoid doing too much math.)
So, if I were to put a zero in the middle of the circle, its 2D representation would look like Figure 1 (notice the red circle in the middle showing where the zero is placed), and the 3D version would look like Figure 2:
If I were to put the pole (indicated by a red ‘x’) in the middle of the circle instead, then the result would look like Figures 3 and 4.
So far so good… If we were to rotate Figures 2 or 4 and look at the red line that I’ve drawn around the edge, we’d see that it’s flat with a height of 0 (on the vertical scale) all the way around. This is because I’ve carefully placed the zero or the pole at the exact middle of the circle, so it’s pulling equally on all points of the edge of the “tent” or the “funnel”.
However, what would happen if I moved the zero or the pole away from the middle? Some examples of this for a zero moved to the location (-0.75, 0) are shown in Figures 5 to 7, below.
As you can see in Figure 7, when the zero is moved away from the centre of the circle, it pulls downwards on the closer edge (notice how the red line is lower than the black line which has a constant height of 0). However, it also doesn’t pull downwards as much on the opposite side of the circle (notice how the red line is higher than the black line on the left side).
Of course, if I were to do the same thing with a pole, everything would behave symmetrically, as shown in Figures 8 to 10.
We’re almost finished… One more posting to go to wrap up.
Generally speaking, digital filters work by taking an audio signal, delaying it, changing the level, and adding the result back to the signal itself.
I then showed an example of a simple digital filter like this:
If we use the filter in Figure 1, set the delay to 0, and set the gain to 1, then the output is just the input signal added to itself, so it’s two times the amplitude or about 6 dB louder.
If we leave the gain at 1, and set the delay to something else – let’s say, 1 ms, for example – then, in the very low frequencies (say, 1 Hz) the phase difference caused by the 1 ms delay is almost nothing – therefore the output will be +6 dB. At 500 Hz, however, the 1 ms delay is equal to a 180º phase shift, so the output of the delay will always be opposite in polarity with the non-delayed signal. Therefore, at 500 Hz, this filter will have no output. At 1 kHz, the output will be +6 dB again, because 1 ms = 360º at 1 kHz. At 1.5 kHz, there’s no output (540º phase shift), at 2 kHz, we’re back to +6 dB, and so on all the way up. The result is a magnitude response that looks like this:
If I used the same delay time on the same filter structure, but set the gain to something between 0 and 1 – let’s make it 0.75, for example, then the overall shape would be the same, but the effect would be less, as shown in Figure 3.
If we make the gain a negative value, then the overall shape remains the same, but the high points and the low points swap places because the delayed signal is now cancelling where it was adding, and vice versa.
Let’s think about this intuitively. If my audio signal cannot exceed a value of 1 (which is normally the way we work… a full-scale signal ranges from -1 to 1) and the gain of the delay output in the filter in Figure 1 also ranges from -1 to 1, then the maximum possible output of the filter is 2.
If I had two delay lines and I were summing all three signals (the input and the two delayed signals) and the gains were still limited within the range of -1 to 1, then the maximum possible output would be 3…
However, the minimum possible output level (not the minimum possible output value) would be no signal (as in the case of a 500 Hz input in Figure 2. This is equivalent to an output of -∞ dB.
If I generalise this, then I can say that if your filter is built using ONLY the summed outputs of feed-forward delays, then the maximum possible output can be easily calculated, and the minimum possible output is no signal.
Still generalising: notice that the “bumps” in the above frequency responses are smooth and rounded, and the dips are pointy notches.
What happens if the filter uses feed-back instead of feed-forward?
Let’s set the delay time to 1 ms again, and set the gain to 0.99. I chose 0.99 instead of 1 because this means that each time the signal re-circulates back, it will get a little bit quieter. If I had set the gain to 1, then the delay would keep “echoing” forever. If I made it greater than 1, then the output would get louder on each re-circulation, and things would get very loud, sooner or later…
So, if Delay = 1 ms and Gain = 0.99 in the filter in Figure 5, the resulting magnitude response looks like Figure 6.
There are some things to notice in Figure 6.
Firstly, notice that the overall “shape” of the curve is upside-down relative to the one in Figure 2. The rounded bits are on the bottom and the pointy bits are on the top. This means that instead of having very narrow notches, you have very narrow resonances that are “singing” like a collection of sinusoidal waves, one at the frequency of each peak.
Secondly, notice that the peaks and the dips are in the same places as in Figure 2. In both cases, the Delay = 1 ms and the gain is positive, so the frequencies that are boosted are the same in both cases. For example, both have a peak at 1 kHz, and a dip at 500 Hz.
Thirdly, notice that the overall level is much, much higher. 40 dB is a LOT louder than 6 dB – this is because the sum of all those re-circulated signals echoing over and over in the filter add up to something loud over time.
If I reduce the gain, but keep it positive, then (just like in the case with the feed-forward filter) the shape of the magnitude response stays the same, it’s just reduced in effect.
If I did the same thing, but set the gain to a negative number (say, -0.99) instead, then each time the signal re-circulates, it flips polarity. The resulting magnitude response looks like Figure 8.
Notice that this is related to the magnitude response in Figure 4 – we have less output in the low end, and now the first peak is at 500 Hz instead of 1 kHz.
If I generalise this one, then I can say that if your filter is built using ONLY the summed outputs of feed-back delays, then the peaks are much higher than with a feed-forward design because they’re resonating.
Still generalising: notice that the “bumps” in the above frequency responses are pointy (because they’re resonances), and the dips are smooth and rounded.
The summary (for now)
Repeating myself, because this is the take-away information for this posting:
If your filter is built using ONLY the summed outputs of feed-forward delays, then:
the maximum possible output can be easily calculated
the minimum possible output is no signal
the “bumps” in the above frequency responses are smooth and rounded
the dips are pointy notches.
if your filter is built using ONLY the summed outputs of feed-back delays, then:
the peaks are much higher in level than with a feed-forward design because they’re resonating
the “bumps” in the above frequency responses are pointy because they’re resonating
If you’ve read the three introductory parts of this series, linked above; and if you’re still awake, then we are ready to start putting things together and jumping to incorrect conclusions…
Let’s say that you’ve been hired to specify a digital audio system for some reason (we’ll assume that it’s an LPCM system – nothing exotic). Using the information I’ve told you so far, you can make two decisions in your specification:
You select a bit depth to be enough to give you the dynamic range you desire. In this case, “dynamic range” means the “distance” in level between the loudest sound you can record / store / transmit (I isn’t say what the “digital audio system” was going to be used for) and the inherent noise floor of the system. If you’re recording the background noise on an airplane while it’s in flight, you don’t need a big dynamic range, because it’s always loud, and never changes. However, if you’re recording a Japanese Taiko Drummer group, you’ll need a huge usable dynamic range because the loud parts of the performance are a LOT louder than the quietest parts.
As we saw in Part 3, an LPCM digital audio system cannot record any audio that has a frequency higher than 1/2 the sampling rate. So, you select a sampling rate that is at least 2x the highest frequency you’re interested in. For example, if you believe the books that say you can hear from 20 Hz to 20,000 Hz, then you might decide that your sampling rate has to be at least 40,000 Hz. On the other hand, if you’re making a subwoofer that you know will never be fed a signal above 120 Hz, then you don’t need a sampling rate higher than 240 Hz.
Don’t get angry yet. I’m just keeping these numbers simple to make the math easy. Later on, I’ll explain why what I just said might not be correct.
I just jumped to at least three conclusions (probably more) that are going to haunt me.
The first was that my “digital audio system” was something like the following:
As you can see there, I took an analogue audio signal, converted it to digital, and then converted it back to analogue. Maybe I transmitted it or stored it in the part that says “digital audio”.
However, the important, and very probably incorrect assumption here is that I did nothing to the signal. No volume control, no bass and treble adjustments… nothing.
We assumed above that we can define the system’s dynamic range based on the dynamic range of the audio signal itself. However, this makes the assumption that the noise floor of the digital system and the noise floor of your audio signal are identical, which is probably not true. As we saw in Part 2, the noise generated by TPDF dither is white – it has the same probability of having a given amount of energy per Hertz. Since we hear sound logarithmically (meaning that, to us, octaves are equal widths. Equal spacings in Hz are not.) This means that the noise sound “bright” to us – because there’s just as much energy in the top octave (say, 10 kHz to 20 kHz, if you believe the books) as there is in all other frequencies combined from 0 Hz up to 10 kHz.
If, however, the noise floor in your concert hall where the taiko drummers are playing is caused by the air conditioning system, then this noise will be a lot louder in the low frequencies than the the highs – which is not the same.
Therefore it’s too simplistic to say “the noise floor of the digital system” and the “noise floor of the signal” – since these two noise floors are different. (As Steven Wright said: “It doesn’t matter what temperature the room is, it’s always room temperature.”)
As we’ll see later, if you’re going to do anything to the signal while it’s in the “digital domain”, then you need to take that into consideration when you’re deciding on your sampling rate. It’s not enough to say “useful audio bandwidth times 2” because there are some side effects that need to be remembered…
However, counter-intuitively, it could be that, in order to improve your system, you’ll want to make the sampling rate LOWER instead of HIGHER – so this is not a simple case of “more is better”.
We’ll get to that topic later. For now, I’ll leave you in suspense.
One thing we saw in Part 3 was that, if we have an audio signal with energy at a frequency higher than 1/2 the sampling rate, and if that signal gets into the analogue-to-digital converter (ADC), then the output of the ADC will contain an error. We’ll get out energy at frequencies that were not in the original, due to the effect called “aliasing“.
Once that’s in the digital audio signal, there’s no removing it, so we need to make sure that the too-high-frequency signals don’t get into the ADC’s input in the first place. This is done using a low-pass filter that (in theory) removes all energy in the signal above the Nyquist frequency (which is equal to 1/2 the sampling rate). Since that low-pass filter prevents aliasing, we call it an anti-aliasing filter. Normally, these days, that antialiasing filter is built into the ADC itself.
As we also saw in Part 3, the digital-to-analogue converter (DAC) has to smooth out the digital signal to convert it from a “staircase” wave to a smoother one. That’s also done with a low-pass filter that eliminates all the harmonics that would be required to make the staircase have sharp corners. Since this is done to re-construct the analogue signal, it’s called a “reconstruction filter“.
This means that, if we pull apart some of the components in the signal chain I showed in Figure 1, it really looks more like this:
I’ve been debating writing a series of postings about “high resolution” audio for a long time – years. Lately, (probably because of some hype generated by some recent press releases) I’ve been getting lots of question (no, that’s not a typo) about it, so it appears the time has come…
To start: the question that I get (a lot) is “If I can’t hear above 20 kHz, then what’s the use of high-res?” As I’ll explain as we go through, this is only one, rather small aspect to consider in this topic. In fact, it might be the least important issue to consider.
However, before I write too much, I’ll say that I’m not going to argue for or against higher resolutions in digital audio systems. I’m only going to go through a bunch of issues that can be used to argue either for or against them. So, there’s not going to be a big reveal at the end of this series telling you that high-res is either better, worse, or no different than whatever you’re using now. It’s merely going to be a discussion of a number of issues that need to be weighed. The problem is that this entire topic is complicated – and there’s no single “right” answer, as I’ll argue as we go along.
To start, let’s get down to basics and look (once again, from the perspectives of this website) at what sound is, and how it’s converted from an analogue electrical signal into a digital representation. The good thing is that I’ve written this introduction before in a different series of postings. So, I’m going to be extremely lazy and just copy-and-paste that information here. I’m not just referring you to another page because I’m intentionally leaving some things out because we’re headed into having a different discussion this time.
A quick introduction to sound
At the simplest level, sound can be described as a small change in air pressure (or barometric pressure) over short periods of time. If you’d like to have a better and more edu-tain-y version of this statement with animations and pretty colours, you could take 10 minutes to watch this video, for example.
That change in pressure can be “captured” by using a microphone, that is (at the simplest level) a device that has a change in air pressure at its input and a change in electrical voltage at its output. Ignoring a lot of details, we could say that if you were to plot a measurement of the air pressure (at the input of the microphone) over time, and you were to compare it to a plot of the measurement of the voltage (at the output of the microphone) over time, you would see the same curve on the two graphs. This means that the change in voltage is analogous to the change in air pressure.
At this point in the conversation, I’ll make a point to say that, in theory, we could “zoom in” on either of those two curves shown in Figure 1 and see more and more details. This is like looking at a map of Canada – it has lots of crinkly, jagged lines. If you zoom in and look at the map of Newfoundland and Labrador, you’ll see that it has finer, crinkly, jagged lines. If you zoom in further, and stand where the water meets the shore in Trepassey and take a photo of your feet, you could copy it to draw a map of the line of where the water comes in around the rocks – and your toes – and you would wind up with even finer, crinkly, jagged lines… You could take this even further and get down to a microscopic or molecular level – but you get the idea… The point is that, in theory, both of the plots in Figure 1 have infinite resolution, both in time and in air pressure or voltage.
Now, let’s say that you wanted to take that microphone’s output and transmit it through a bunch of devices and wires that, in theory, all do nothing to the signal. Let’s say, for example, that you take the mic’s output, send it through a wire to a box that makes the signal twice as loud. Then take the output of that box and send it through a wire to another box that makes it half as loud. You take the output of that box and send it through a wire to a measuring device. What will you see? Unfortunately, none of the wires or boxes in the chain can be perfect, so you’ll probably see the signal plus something else which we’ll call the “error” in the system’s output. We can call it the error because, if we measure the input voltage and the output voltage at any one instant, we’ll probably see that they’re not identical. Since they should be identical, then the system must be making a mistake in transmitting the signal – so it makes errors…
Pedantic Sidebar: Some people will call that error that the system adds to the signal “noise” – but I’m not going to call it that. This is because “noise” is a specific thing – noise is random – so if it’s not random, it’s not noise. Also, although the signal has been distorted (in that the output of the system is not identical to the input) I won’t call it “distortion” either, since distortion is a name that’s given to something that happens to the signal because the signal is there. (We would probably get at least some of the error out of our system even if we didn’t send any audio into it.) So, we could be slightly geeky and adequately vague and call the extra stuff “Distortion plus noise” but not “THD+N” – which stands for “Total Harmonic Distortion Plus Noise” – because not all kinds of distortion will produce a harmonic of the signal… but I’m getting ahead of myself…
So, we want to transmit (or store) the audio signal – but we want to reduce the noise caused by the transmission (or storage) system. One way to do this is to spend more money on your system. Use wires with better shielding, amplifiers with lower noise floors, bigger power supplies so that you don’t come close to their limits, run your magnetic tape twice as fast, and so on and so on. Or, you could convert the analogue signal (remember that it’s analogous to the change in air pressure over time) to one that is represented (and therefore transmitted or stored) digitally instead.
What does this mean?
Conversion from analogue to digital and back (but skipping important details)
IMPORTANT: If you read this section, then please read the following postings as well. This is because, in order to keep things simple to start, I’m about to leave out some important details that I’ll add afterwards. However, if you don’t add the details, you could (understandably) jump to some incorrect conclusions (that many others before you have concluded…) So, if you don’t have time to read both sections, please don’t read either of them.
In the example above, we made a varying voltage that was analogous to the varying air pressure. If we wanted to store this, we could do it by varying the amount of magnetism on a wire or a coating on a tape, for example. Or we could cut a wiggly groove in a bit of vinyl that has a similar shape to the curve in the plots in Figure 1. Or, we could do something else: we could get a metronome (or a clock) and make a measurement of the voltage every time the metronome clicks, and write down the measurements.
For example, let’s zoom in on the first little bit of the signal in the plots in Figure 1
We’ll then put on a metronome and make a measurement of the voltage every time we hear the metronome click…
We can then keep the measurements (remembering how often we made them…) and write them down like this:
We can store this series of numbers on a computer’s hard disk, for example. We can then come back tomorrow, and convert the measurements to voltages. First we read the measurements, and create the appropriate voltage…
We then make a “staircase” waveform by “holding” those voltages until the next value comes in.
All we need to do then is to use a low-pass filter to smooth out the hard edges of the staircase.
So, in this example, we’ve gone from an analogue signal (the red curve in Figure 3) to a digital signal (the series of numbers), and back to an analogue signal (the red curve in Figure 7).
In some ways, this is a bit like the way a movie works. When you watch a movie, you see a series of still photographs, probably taken at a rate of 24 pictures (or frames) per second. If you play those photos back at the same rate (24 fps or frames per second), you think you see movement. However, this is because your eyes and brain aren’t fast enough to see 24 individual photos per second – so you are fooled into thinking that things on the screen are moving.
However, digital audio is slightly different from film in two ways:
The sound (equivalent to the movement in the film) is actually happening. It’s not a trick that relies on your ears and brain being too slow.
If, when you were filming the movie, something were to happen between frames (say, the flash of a gunshot, for example) then it would never be caught on film. This is because the photos are discrete moments in time – and what happens between them is lost. However, if something were to make a very, very short sound between two samples (two measurements) in the digital audio signal – it would not be lost. This is because of something that happens at the beginning of the chain that I haven’t described… yet…
However, there are some “artefacts” (a fancy term for “weird errors”) that are present both in film and in digital audio that we should talk about.
The first is an error that happens when you mess around with the rate at which you take the measurements (called the “sampling rate”) or the photos (called the “frame rate”) – and, more importantly, when you need to worry about this. Let’s say that you make a film at 24 fps. If you play this back at a higher frame rate, then things will move very quickly (like old-fashioned baseball movies…). If you play them back at a lower frame rate, then things move in slow motion. So, for things to look “normal” you have to play the movie at the same rate that it was filmed. However, as long as no one is looking, you can transfer the movie as fast as you like. For example, if you wanted to copy the film, you could set up a movie camera so it was pointing at a movie screen and film the film. As long as the movie on the screen is running in sync with the camera, you can do this at any frame rate you like. But you’ll have to watch the copy at the same frame rate as the original film… (Note that this issue is not something that will come up in this series of postings about high resolution audio)
The second is an easy artefact to recognise. If you see a car accelerating from 0 to something fast on film, you’ll see the wheels of the car start to get faster and faster, then, as the car gets faster, the wheels slow down, stop, and then start going backwards… This does not happen in real life (unless you’re in a place lit with flashing lights like fluorescent bulbs or LED’s). I’ll do a posting explaining why this happens – but the thing to remember here is that the speed of the wheel rotation that you see on the film (the one that’s actually captured by the filming…) is not the real rotational speed of the wheel. However, those two rotational speeds are related to each other (and to the frame rate of the film). If you change the real rotational rate or the frame rate, you’ll change the rotational rate in the film. So, we call this effect “aliasing” because it’s a false version (an alias) of the real thing – but it’s always the same alias (assuming you repeat the conditions…) Digital audio can also suffer from aliasing, but in this case, you put in one frequency (which is actually the same as a rotational speed) and you get out another one. This is not the same as harmonic distortion, since the frequency that you get out is due to a relationship between the original frequency and the sampling rate, so the result is almost never a multiple of the input frequency. (We’re going to dig into this a lot deeper through this series of postings about high resolution audio, so if it doesn’t immediately make sense, don’t worry…)
Some important details that I left out…
One of the things I said above was something like “we measure the voltage and store the results” and the example I gave was a nice series of numbers that only had 4 digits after the decimal point. This statement has some implications that we need to discuss.
Let’s say that I have a thing that I need to measure. For example, Figure 8 shows a piece of metal, and I want to measure its width.
Using my ruler, I can see that this piece of metal is about 57 mm wide. However, if I were geeky (and I am) I would say that this is not precise enough – and therefore it’s not accurate. The problem is that my ruler is only graduated in millimetres. So, if I try to measure anything that is not exactly an integer number of mm long, I’ll either have to guess (and be wrong) or round the measurement to the nearest millimetre (and be wrong).
So, if I wanted you to make a piece of metal the same width as my piece of metal, and I used the ruler in Figure 8, we would probably wind up with metal pieces of two different widths. In order to make this better, we need a better ruler – like the one in Figure 9.
Figure 9 shows a vernier caliper (a fancy type of ruler) being used to measure the same piece of metal. The caliper has a resolution of 0.05 mm instead of the 1 mm available on the ruler in Figure 8. So, we can make a much more accurate measurement of the metal because we have a measuring device with a higher precision.
The conversion of a digital audio signal is the same. As I said above, we measure the voltage of the electrical signal, and transmit (or store) the measurement. The question is: how accurate and precise is your measurement? As we saw above, this is (partly) determined by how many digits are in the number that you use when you “write down” the measurement.
Since the voltage measurements in digital audio are recorded in binary rather than decimal (we use 0 and 1 to write down the number instead of 0 up to 9) then we use Binary digITS – or “bits” instead of decimal digits (which are not called “dits”). The number of bits we have in the number that we write down (partly) determines the precision of the measurement of the voltage – and therefore (possibly), our accuracy…
Just like the example of the ruler in Figure 8, above, we have a limited resolution in our measurement. For example, if we had only 4 bits to work with then the waveform in 4 – the one we have to measure – would be measured with the “ruler” shown on the left side of Figure 10, below.
When we do this, we have to round off the value to the nearest “tick” on our ruler, as shown in Figure 11.
Using this “ruler” which gives a write-down-able “quantity” to the measurement, we get the following values for the red staircase:
When we “play these back” we get the staircase again, shown in Figure 12.
Of course, this means that, by rounding off the values, we have introduced an error in the system (just like the measurement in Figure 8 has a bigger error than the one in Figure 9). We can calculate this error if we just subtract the original signal from the output signal (in other words, Figure 12 minus Figure 10) to get Figure 13.
In order to improve our accuracy of the measurement, we have to increase the precision of the values. We can do this by adding an extra digit (or bit) to the number that we use to record the value.
If we were using decimal numbers (0-9) then adding an extra digit to the number would give us 10 times as many possibilities. (For example, if we were using 4 digits after the decimal in the example at the start of this posting, we have a total of 10,000 possible values – 0.0000 to 0.9999. If we add one more digit, we increase the resolution to 100,000 possible values – 0.00000 to 0.99999 ).
In binary, adding one extra digit gives us twice as many “ticks” on the ruler. So, using 4 bits gives us 16 possible values. Increasing to 5 bits gives us 32 possible values.
If you’re listening to a CD, then the individual measurements of each voltage – the “sample values” – are stored with 16 bits, which means that we have 65,536 possible values to pick from.
Remember that this means that we have more “ticks” on our ruler – but we don’t necessarily increase its range. So, for example, we’re still measuring a voltage from -1 V to 1 V – we just have more and more resolution with which we can do that measurement.